practo / tipoca-stream

Near real time cloud native data pipeline in AWS (CDC+Sink). Hosts code for RedshiftSink. RDS to RedshiftSink Pipeline with masking and reloading support.
https://towardsdatascience.com/open-sourcing-tipoca-stream-f261cdcc3a13
Apache License 2.0
47 stars 5 forks source link

Conditional_non_pii_keys behavior is not consistent with original LIKE behavior #230

Open justjkk opened 3 years ago

justjkk commented 3 years ago

https://github.com/practo/tipoca-stream/blob/ec2941084d71889a5cea1d12934f2b64c5211049/redshiftsink/pkg/transformer/masker/mask_config.go#L379-L383

The above logic is not consistent with the LIKE functionality since:

  1. _ in LIKE should be interpreted as . in regex. (Ref: SQL LIKE)
  2. ., ?, etc.. in LIKE should be treated as literal and should not have special meaning.
  3. LIKE query in MySQL can be case sensitive when run on a column that uses case insensitive collation(eg: utf8_general_ci).
justjkk commented 3 years ago

Checked the existing definitions and found that there is pretty much no impact. However, we should look into deprecating the % syntax in favor of regex based ones.