Closed KJTsanaktsidis closed 4 months ago
I'll have to spend a little time looking at the changes. It make me uncomfortable that some of the existing tests were removed. Did you run any benchmarks to see what impact on performance the change had?
I definitely shuffled some tests around but I didn’t mean to remove any. What did I delete? I’ll fix that for sure :)
good call on benchmarking - I’ll put something together today.
Maybe it was the moving around part that made it appear as if some tests were removed. I will look more carefully.
Were you able to put together a benchmark and fix the clang formatting issues?
Sorry, I haven’t gotten to that - I do plan to in the next couple of days!
OK, I've fixed the clang-format problems, and I put together a benchmark: https://gist.github.com/KJTsanaktsidis/f85be084d61aca54f8493ab63fe0707f
Without this patch:
Calculating -------------------------------------
long_7bit_printable_string with RailsXEsc mode
2.133k (± 4.0%) i/s - 10.835k in 5.088119s
long_7bit_printable_string with RailsEsc mode
1.914k (± 3.3%) i/s - 9.750k in 5.099691s
long_ascii_string with RailsXEsc mode
163.294 (± 1.8%) i/s - 832.000 in 5.096756s
long_ascii_string with RailsEsc mode
175.841 (± 2.3%) i/s - 884.000 in 5.029591s
long_angle_bracket_string with RailsXEsc mode
275.981 (± 2.9%) i/s - 1.400k in 5.077551s
long_angle_bracket_string with RailsEsc mode
1.951k (± 3.3%) i/s - 9.945k in 5.103185s
long_utf8_multibyte_string with RailsXEsc mode
359.297 (± 2.2%) i/s - 1.800k in 5.012476s
long_utf8_multibyte_string with RailsEsc mode
654.515 (± 2.3%) i/s - 3.315k in 5.067395s
With this patch:
Calculating -------------------------------------
long_7bit_printable_string with RailsXEsc mode
2.120k (± 3.0%) i/s - 10.761k in 5.080555s
long_7bit_printable_string with RailsEsc mode
2.130k (± 3.0%) i/s - 10.812k in 5.081807s
long_ascii_string with RailsXEsc mode
169.397 (± 2.4%) i/s - 848.000 in 5.009189s
long_ascii_string with RailsEsc mode
179.139 (± 2.2%) i/s - 901.000 in 5.032772s
long_angle_bracket_string with RailsXEsc mode
284.548 (± 2.5%) i/s - 1.430k in 5.028596s
long_angle_bracket_string with RailsEsc mode
2.119k (± 3.1%) i/s - 10.750k in 5.079096s
long_utf8_multibyte_string with RailsXEsc mode
425.998 (± 2.3%) i/s - 2.142k in 5.031145s
long_utf8_multibyte_string with RailsEsc mode
426.126 (± 2.3%) i/s - 2.142k in 5.029241s
The only substantial difference is that the "lots of multibyte characters" case with this patch now takes the same amount of time regardless of using RailsEsc
mode or RailsXEsc
mode, whereas before it was faster in RailsEsc
mode (because it wasn't validating any of the characters). But in non-multibyte-heavy workloads it seems the same.
Benchmarks look good. I'll start a more detailed review to get this merged.
LGTM other than one open question.
Thanks for the work. I know I was a little picky. Maybe too much so, sorry.
Thanks for the work. I know I was a little picky. Maybe too much so, sorry.
Not at all! Thanks for your attention on this.
My best idea is... Rename rails_xss_friendly_size to size_t required_buffer_size_for_escaped_string(const uint8_t str, size_t len, const char cmap, bool *has_hi_out). Make it accept the cmap to use as a parameter, essentially, and pass hi as an explicit out-param rather than smuggling it out via the sign bit Call required_buffer_size_for_escaped_string in case RailsXEsc and case RailsEsc with different cmaps (rails_xss_friendly_chars vs rails_friendly_chars)
Do you want me to open another PR with those changes?
This is a potential fix for https://github.com/ohler55/oj/issues/911. Currently, whether or not
Oj::Rails::Encoder
raises on invalid unicode data depends on the value ofActiveSupport.escape_html_entities_in_json
. In order to accurately mimic the behaviour of stock Rails with the stock json gem, it should in fact raise an exception regardless.I've so far deliberately copied rather than shared functionality that's shared between
RailsEsc
andRailsXEsc
mode, because I wasn't quite sure how to factor the similarities out. We can leave it like this, or I'm happy to take pointers on a way to factor this down better.I added a testcase for invalid Unicode to the Rails 6 & 7 encoding tests, and also parameterised the existing unicode-related tests to make sure they work correctly with both settings of
ActiveSupport.escape_html_entities_in_json
.