Masking feature: regex pattern boolean keys

alok87 commented 3 years ago

Why? Helps in keeping free text columns masked and adds a boolean column giving boolean info about the kind of value in the free text column.

What?

Adds support for regex boolean keys
Refactors the schema code for extra columns. Previous code assumes 1:1 mapping between actual column and the extra column. This PR fixes that assumption by keeping extra columns schema separate.

Masking Feature added

Regex Pattern Boolean Keys

Free text columns can contain PII so we do not unmask it, but we want the user to make aggregate analysis on the non pii data in it. So using this a user gets boolean column stating that the text/regex in the complete free text is present.

For example: We add a boolean column favourite_quote_has_philosphy. If value in column favourite_quote matches the regex 'life|time' (case insensitive), then the value in extra column favourite_quote_has_philosphy is true else false.

regex_pattern_boolean_keys:
    customers:
        favourite_quote:
            has_philosphy: 'life|time'
            has_text_funny: 'funny'

alok87 commented 3 years ago

Bug: instead of false it is showing data as empty for the bool cols Screenshot 2021-05-21 at 3 25 25 PM

alok87 commented 3 years ago

Testing in production.

alok87 commented 3 years ago

Length keys enabled if already exist needs to be recreated if the names are not in order

practo / tipoca-stream

Masking feature: regex pattern boolean keys #232

Masking Feature added

Regex Pattern Boolean Keys