Remove obsolete workaround for slow encoding of unicode characters.

matrix-org / python-canonicaljson

Canonical JSON

Apache License 2.0

31 stars 15 forks source link

Remove obsolete workaround for slow encoding of unicode characters. #30

Closed clokep closed 4 years ago

clokep commented 4 years ago

Bump the version of simplejson to a version that properly handles ensure_ascii=False (>3.14.0)
Remove the manual unascii-ifying and use ensure_ascii=False, which is now at least as fast.

clokep commented 4 years ago

(I feel like the title of this PR isn't really descriptive, by the way. Not sure of a better way to succinctly describe this change.)

clokep commented 4 years ago

Requesting review from @richvdh since this was broken off of #29, but feel free to redirect if you'd like!

richvdh commented 4 years ago

(I feel like the title of this PR isn't really descriptive, by the way. Not sure of a better way to succinctly describe this change.)

yeah. _unascii didn't really "decode ASCII" - it decoded the `\uXXXX escape sequences emitted by the json encoder, which were certainly not ascii.

Remove obsolete workaround for slow encoding of unicode characters.

maybe?

richvdh commented 4 years ago

hrm, according to https://matrix.org/docs/spec/appendices#grammar they should (with a few exceptions for "\n", "\r", etc), come out as "\u001F". Either way, can you add some tests to check behaviour is maintained?

clokep commented 4 years ago

hrm, according to https://matrix.org/docs/spec/appendices#grammar they should (with a few exceptions for "\n", "\r", etc), come out as "\u001F". Either way, can you add some tests to check behaviour is maintained?

I added a test that checks from 0x00 to 0x7E (the last printable ASCII character). I might have gone overboard, but I figured why not. 😄

I did check the behavior of these on master first -- I can move them to a separate PR if we want to CI pass on them first.

clokep commented 4 years ago

Arg, lint gets me every time on this repo. I think I might just black-ify it.