pythonicrubyist / creek

Ruby library for parsing large Excel files.
http://rubygems.org/gems/creek
MIT License
388 stars 109 forks source link

NoMethodError: undefined method any? in creek/sheet.rb; related to Memory Usage #121

Open alexhornick-dt opened 11 months ago

alexhornick-dt commented 11 months ago

Starting with Creek 2.6.2, we were getting the following exception when stitching together large Excel documents:

NoMethodError: undefined method `any?' for nil:NilClass\n/usr/local/bundle/ruby/2.7.0/gems/creek-2.6.3/lib/creek/sheet.rb:107:in `block (3 levels) in rows_generator'\n\t/usr/local/bundle/ruby/2.7.0/gems/nokogiri-1.13.10-x86_64-linux/lib/nokogiri/xml/reader.rb:100:in `each'\n\t/usr/local/bundle/ruby/2.7.0/gems/creek-2.6.3/lib/creek/sheet.rb:106

And alongside this exception, we seemed to be hitting our max memory allocation, causing our process to crash. After seeing the resolution of https://github.com/pythonicrubyist/creek/issues/111 in 2.6.3, we tried upgrading, and saw better performance, but we still got this exception on large files. We've since reverted to 2.5.3 and haven't had the issue yet since.

I ran a quick test comparing memory usage of 2.5.3 and 2.6.3. I used three Excel files of varying sizes (20MB, 91.6MB, and 80MB), for a combined total of 190MB.

Creek 2.5.3 stitched the files together successfully, and seemed to peak at around 1-1.1GB of RAM. Creek 2.6.3 failed to stitch the files together, the highest peak I saw was around ~7GB but it may have crashed after 8GB.

azrazalea-debtbook commented 7 months ago

Are you running in rails? I tried a similar methodology comparing 2.5.3 to 2.6.3 in irb and got no difference.

When I then tried in a rails console is when I saw this happen.

alexhornick-dt commented 7 months ago

Are you running in rails? I tried a similar methodology comparing 2.5.3 to 2.6.3 in irb and got no difference.

When I then tried in a rails console is when I saw this happen.

Yes, also running in rails. Nice catch, I'll keep an eye on https://github.com/pythonicrubyist/creek/issues/122 too, thanks for the extra details and reproducibility.

azrazalea-debtbook commented 7 months ago

@alexhornick-dt Nevermind. I figured out I could reproduce this on native mac too, it's just worse in docker. Rails isn't the culprit for that worse behavior, docker appears to be.