stripe-archive / mosql

MongoDB → PostgreSQL streaming replication
MIT License
1.63k stars 225 forks source link

Fix for non-utf-8 string handling #83

Open ghost opened 9 years ago

ghost commented 9 years ago

Adding support for converting non utf-8 strings to utf-8 to preventmosql from crashing when it encounters badly formatted strings.

Note: This fix uses force encoding of strings (as I am unaware if there is a better way to do this)

nelhage commented 9 years ago

@http301 looks like this broke the tests – can you resolve that somehow, and also add a test demonstrating the bug that this fixes?

Thanks!

ghost commented 9 years ago

@nelhage I did that. Stupid of me to assume everything else would be a string. Sorry about that. Also added a test for the same

nelhage commented 9 years ago

The provided test case doesn't fail if I revert the rest of your changes, so if there is a real bug that's being fixed here, your test case isn't demonstrating it :/

From the note you provided in email, I suspect (but am not positive) that you have a record in the database that contains non-UTF-8 in a string, which I'm concerned may be tricky to get Ruby to deal with in the way we want :/

ghost commented 9 years ago

I changed the test. I realized I was testing the wrong stuff originally.

What am I expected to test? The method transform_primitive. In case of malformed ASCII (or anyother format for that matter), we need to force encode to UTF-8 (This was indeed the case in my case and there apparently isn't a better way to handle this)

nelhage commented 9 years ago

I would like to see a test that demonstrates a correct behavior, that fails before the patch, and not after.

The test that's currently there fails before and passes after, but it's not clear to me why it's demonstrating a correct behavior – force_encoding a non-utf8 string into utf-8 would be surprising behavior at best. Is there a test that calls handle_op that demonstrates the behavior?

CLAassistant commented 4 years ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


http301 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.