Near real time cloud native data pipeline in AWS (CDC+Sink). Hosts code for RedshiftSink. RDS to RedshiftSink Pipeline with masking and reloading support.
Why?
Helps in keeping free text columns masked and adds a boolean column giving boolean info about the kind of value in the free text column.
What?
Adds support for regex boolean keys
Refactors the schema code for extra columns. Previous code assumes 1:1 mapping between actual column and the extra column. This PR fixes that assumption by keeping extra columns schema separate.
Masking Feature added
Regex Pattern Boolean Keys
Free text columns can contain PII so we do not unmask it, but we want the user to make aggregate analysis on the non pii data in it. So using this a user gets boolean column stating that the text/regex in the complete free text is present.
For example: We add a boolean column favourite_quote_has_philosphy.
If value in column favourite_quote matches the regex 'life|time' (case insensitive), then the value in extra column favourite_quote_has_philosphy is true else false.
Why? Helps in keeping free text columns masked and adds a boolean column giving boolean info about the kind of value in the free text column.
What?
Masking Feature added
Regex Pattern Boolean Keys
Free text columns can contain PII so we do not unmask it, but we want the user to make aggregate analysis on the non pii data in it. So using this a user gets boolean column stating that the text/regex in the complete free text is present.
For example: We add a boolean column
favourite_quote_has_philosphy
. If value in columnfavourite_quote
matches the regex'life|time'
(case insensitive), then the value in extra columnfavourite_quote_has_philosphy
istrue
elsefalse
.