josacar / triki

Mysql, PostgreSQL and SQL dump obfuscator aka anonimizer
MIT License
38 stars 4 forks source link

Leveraging on Tokenization/Lexers alongside with Regex for certain complex scenarios #22

Open Ara4Sh opened 4 months ago

Ara4Sh commented 4 months ago

Hello,

Thank you for your hard work on this project. The tool is incredibly useful, and I appreciate your dedication.

I'd like to propose having tokenization/lexers for pattern matching along side with regex. This change could improve reliability and consistency in obfuscating sensitive information (specially PII) and enhance error handling and complex structures, It's not as fast as regular expressions but it might be very useful when performance is not a KPI.

Is it feasible to integrate tokenization/lexers into the current codebase? Would this improve consistency and reliability in obfuscation when dealing with large file processing (over 1TB) in your opinion?

josacar commented 4 months ago

Hi, can you provide an example? Are you talking about providing a custom matcher ? A custom masking to a given input?

Thanks in advance.

Ara4Sh commented 3 months ago

something like go-sqllexer as custom matcher for certain scenarios.

josacar commented 2 months ago

You can do lambas in each column to specify a condition and a replacement, are you looking for something more generic?