Open louislva opened 1 year ago
some of these seem to be covered by the maskInputOptions
parameter of the rrweb.record
function. The other cases could be handled in the maskInputFn
and maskTextFn
parameters of rrweb.record
.
https://github.com/rrweb-io/rrweb/blob/master/guide.md#options
MaskInputOptions
:
https://github.com/rrweb-io/rrweb/blob/588164aa12f1d94576f89ae0210b98f6e971c895/packages/rrweb-snapshot/src/types.ts#L77-L95
Probably still makes sense to build some kind of test-suite with mock events for rrweb.record
to ensure that all edge cases are covered.
That actually looks pretty suitable! I'm curious whether maskInputFn
& maskTextFn
can also replace the masked value with a placeholder? Even if they can't, for shipping V1 we just need to censor personal data, not nessacarily do the placeholders (although they'd be really useful to train with). I think we'll just put a "anonymization_scheme_version" column in the database, so you can see what's what.
Also, how do you think we'll go about censoring data we don't know is personally identifiable? For example, if I'm logged into Google, it'll display my full name in certain places.
One idea I had was to automatically scrape it (or simply ask the user for all their personal details), save it locally, and then use maskTextFn
to look for the data which we know to be personal.
Looked into it, you set a maskTextSelector
(could probably be *
), and then maskTextFn
get triggered, which basically maps from old text to new text. So yes, we can do placeholders 🥳
Another important test case: profile picture anonymization! (in the top right of Github for example; pretty easy to recover someone's identity with a picture of their face)
We need to develop automatic data anonymization, and to do that sanely, we should have a test-suite to check for false negatives in the data anonymization.
A simple way to do that: Record a number of sessions of humans typing in (fake) sensitive data, and save them as JSON files. Then make a test-suite that puts each JSON file through the
anonymize()
function, and checks whether the values to be anonymized are present after. It should also check for them inside the concatenated keystrokes. If they are still present, this should fail the test case.The kind of sensitive data we should test for: