oozcitak / xmlbuilder-js

An XML builder for node.js
MIT License
919 stars 108 forks source link

XMLStringifier attEscape slow #225

Closed SLKnutson closed 5 years ago

SLKnutson commented 5 years ago

When profiling my app, this function ends up being the most expensive thing for some operations. I found a stack overflow entry that is supposedly faster implementation of nearly the same thing. When I used it in my app it ran in 20% of the original time. https://stackoverflow.com/a/27979933 I don't know coffescript, but I think this is a pretty straightforward replacement. Thanks!

oozcitak commented 5 years ago

I added some performance tests. You can run the tests with npm run perf. However, I can't see much of a difference. Can you please post your implementation and a test case?

Results with the proposed modification compared to the current version:

v13.0.2* (Working Tree):
  - Attribute value escaping => 0.0101 ms (v13.0.2 was 0.0109 ms) 7% better
  - Attribute value escaping (no replacement) => 0.0047 ms (v13.0.2 was 0.0047 ms)
  - Create simple object => 0.0211 ms (v13.0.2 was 0.0219 ms) 4% better
  - Long attribute value => 0.0353 ms (v13.0.2 was 0.0417 ms) 15% better
  - Long text => 0.0530 ms (v13.0.2 was 0.0283 ms) 87% worse
  - Text escaping => 0.0104 ms (v13.0.2 was 0.0111 ms) 6% better
  - Text escaping (no replacement) => 0.0041 ms (v13.0.2 was 0.0041 ms)

And my implementation of the stack overflow post:

  # Escapes special characters in text
  #
  # See http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html#charescaping
  #
  # `str` the string to escape
  textEscape: (str) ->
    if @options.noValidation then return str
    pattern = if @options.noDoubleEncoding then /(?!&\S+;)&|<|>|\r/g else /[&<>\r]/g
    str.replace pattern, (c) ->
      switch c
        when '&' then '&amp;'
        when '<' then '&lt;'
        when '>' then '&gt;'
        when '\r' then '&#xD;'

  # Escapes special characters in attribute values
  #
  # See http://www.w3.org/TR/2000/WD-xml-c14n-20000119.html#charescaping
  #
  # `str` the string to escape
  attEscape: (str) ->
    if @options.noValidation then return str
    pattern = if @options.noDoubleEncoding then /(?!&\S+;)&|<|"|\t|\n|\r/g else /[&<"\t\n\r]/g
    str.replace pattern, (c) ->
      switch c
        when '&' then '&amp;'
        when '<' then '&lt;'
        when '"' then '&quot;'
        when '\t' then '&#x9;'
        when '\n' then '&#xA;'
        when '\r' then '&#xD;'
SLKnutson commented 5 years ago

Good call. I tried the "faster" implementation with some varying inputs. It seems like the original version is faster for longer strings. The "faster" implementation is definitely not a clear winner.

I didn't notice because for my use case, I have mostly short values (true, false, enums, numbers), with most most being fewer than 10 characters.

Thanks for looking into this. Sorry for the false alarm.

oozcitak commented 5 years ago

That's my conclusion as well. I don't see any reason to change the implementation at this time. Thanks for taking the time to report this.