sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.13k stars 900 forks source link

investigate #1970 - string optimization #1986

Open flavorjones opened 4 years ago

flavorjones commented 4 years ago

Background

PR #1970 was opened by @ashmaroli to attempt to reduce the number of allocated empty strings created by Nokogiri.

This PR was merged and released in v1.11.0.rc1.

@ashmaroli repeated his profiling (I think using the memory_profilergem) and commented that it didn't seem to have an effect.

Actions

  1. reproduce the original memory profliling scenario - @ashmaroli please help with this
  2. determine whether the patch can be tweaked to reduce the number of allocated strings or no
  3. make a recommendation - revert or re-patch

Retrospecting

I should have asked for more details from @ashmaroli when he submitted the PR to reproduce his results, and demonstrate improvement, before merging. This would have allowed me to ask questions at a more appropriate time.

ashmaroli commented 4 years ago

Sorry for the blunt comment in #1970 , @flavorjones I tested the submitted patch outside the context of Nokogiri (on a Windows system) and assumed it would provide the expected reduction in allocation.. My bad.

IMO, this need not be reverted since it doesn't appear to be wrong or have a side-effect. (I don't have a JRuby environment, so can't really say about that aspect).

But I did some additional tests by directly editing the installed gems (v1.10.7 and v1.11.0-rc1). I now think the "" allocation could be arising from the native C extension — the gems are built fine.

Yes, the profiling is via the memory_profiler gem using TravisCI builds. The source material is Jekyll's documentation site. The latest build is at: https://travis-ci.org/jekyll/jekyll/jobs/645395906

The profiling script is at: https://github.com/jekyll/jekyll/blob/master/rake/profile.rake (invoked with task args [memprof.txt,core])

ashmaroli commented 4 years ago

While not directly related to the issue under investigation here, I'm leaving a note regarding a couple of areas that could be refactored in future to reduce overall String allocations.

The following native methods return a new Ruby String on every call:


In pure Ruby, the above would've been attr_readers for the concerned classes. Hypothetically,

class Document
  # Returns value of @encoding or nil
  # @encoding is set by native method `set_encoding` exposed as `:encoding=`
  attr_reader :encoding
end
larskanis commented 4 years ago

@ashmaroli I did a lot of benchmarks when tuning ruby-pg. One outcome was, that allocation counts can be a good indicator for performance improvements, but don't have to. In particular short string allocations like the above are way faster than lookups in a ruby Hash or st_tables (instance variables) - in fact by a factor of 3 to 5. String handling is very well optimized in ruby these days, so that reusing them is often more expensive than doing new allocations and corresponding GC'ing. Please conduct timer based benchmarks to verify effectiveness of such optimizations.

ashmaroli commented 4 years ago

@larskanis Thank you for joining this discussion. I understand that there is always a delicate balance between memory-usage and time-consumption and getting the right balance is tough.

In particular short string allocations like the above are way faster than lookups in a ruby Hash or st_tables (instance variables) - in fact by a factor of 3 to 5.

Wow! That's interesting. Do you have a benchmark script that I can use to reproduce this observation? Factor of 3 to 5 is a significant difference.

larskanis commented 4 years ago

@ashmaroli After a bit of benchmarking here and here I noticed that I was wrong in my comment above. Reusing strings from st_table is actually somewhat faster that allocation of new strings. The benchmarks that I did with ruby-pg are several years ago. They were less syntetic than the above, but I'm not sure what exactly was different there.

Still I think that desitions based on allocation counts are valuable, but should be verified by timer based benchmarks.