josacar / triki

Mysql, PostgreSQL and SQL dump obfuscator aka anonimizer
MIT License
38 stars 4 forks source link

Issue with semicolon within field values in MySQL dump files #21

Closed Ara4Sh closed 1 month ago

Ara4Sh commented 3 months ago

Hello,

I was experimenting with the library recently and noticed an issue when there is a semicolon in the values of MySQL dump files. Specifically, the application stop obfuscation when encountering semicolons within the data as we expect since the delimiter is semicolon it's going to change the regex matching behaviour completely.

Example:

(11,'Sirius','Black','8009008090','sirius.black@gryffindor.co.uk','wizard','known as Padfoot; the last heir of the House of Black; son of Orion and Walburga Black','1959-11-03 10:32:27','1996-06-18 02:07:56'),

As we already know, the --hex-blob option only works with binary data. To address this, I tried to preprocess the dump file by changing the delimiter from semicolon to another character. Then, I modified the make_insert_statement() and rows_to_be_inserted() methods in the MySQL module, as well as the parse() method in the InsertStatementParser to make it work. after processing, I reverted the delimiter back to semicolon.

My question is: do you have any experience dealing with such issues? Would it be a good idea to make the delimiter configurable within the library?

Thank you for creating and contributing this library to OS community.

Best regards, Arash

josacar commented 3 months ago

Hi,

yes, I broke the support for semicolon inside the values when I added support for MariaDB multi-line dump.

So in the first phase, the statements are read from the file, before was each line ( that is original mysqldump format ), so everything was fine but MariaDB. To fix this, I changed to read until the ; character, to get a full sql statement, but it was to naive and stopped no matter if ; was a value inside a string.

So I pushed a 'quick fix' with a test that will read to );\n, that is not perfect but maybe faster than a regex or a full sql parser.

Let me know if this works for you.

Ara4Sh commented 2 months ago

I tried the new patch with a simple table with fields with semicolon, Triki will just print out the same input without any errors, I tried to use triki from shards or directly call it within the directory (require "./src/triki"). I will try to debug it further and will post the result.

josacar commented 1 month ago

Can you check version 0.3.1?

This is the relevant commit https://github.com/josacar/triki/commit/386c60822ed86a286688468a09ac046b3fdd3954

josacar commented 1 month ago

Also I push 0.3.2 version recently, can you check?

Ara4Sh commented 3 weeks ago

Thanks, I will check it out.