roo-rb / roo

Roo provides an interface to spreadsheets of several sorts.
MIT License
2.79k stars 501 forks source link

Excelx Memory and Performance Optimization #434

Closed chopraanmol1 closed 6 years ago

chopraanmol1 commented 6 years ago

Although not much difference for retained memory, this PR focuses on reducing allocated object/memory. Some of the changes are really a low-level optimization but they helped in reducing memory usage.

The script used for benchmarking

MemoryProfiler.report{Roo::Excelx.new('Some File Name.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}}}.pretty_print

Benchmark.measure{ Roo::Excelx.new('Some File Name.xlsx').tap{|x|(2..x.last_row).each{|i| x.row(i)}} }

Results for master:

Total allocated: 46177542 bytes (712001 objects)
Total retained:  5272998 bytes (59447 objects)

allocated memory by gem
-----------------------------------
  25878133  roo/lib
  15386849  nokogiri-1.8.3
   4909416  rubyzip-1.2.1
      1488  racc
      1400  tmpdir
       256  other

  0.810000   0.020000   0.830000 (  0.831865)

Results after changes:

Total allocated: 18303433 bytes (201238 objects)
Total retained:  4705998 bytes (59431 objects)

allocated memory by gem
-----------------------------------
   9875103  roo/lib
   4909416  rubyzip-1.2.1
   3516378  nokogiri-1.8.3
      1400  tmpdir
       880  racc
       256  other

  0.310000   0.010000   0.320000 (  0.309184)

This PR improves both memory usage and CPU performance by around 60%. But this result will majorly vary based on xlsx file used for benchmarking (Thus my result may be biased).

I'll suggest creating a common big xlsx file which contains all supported data types and re-benchmark on that file.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.3%) to 94.001% when pulling ead58f3583e24ce251361ed88a74ce1333445542 on chopraanmol1:excelx_memory_performace_optimization into 6f703f0437180983e4f22ecd4d752ec9cdf54aee on roo-rb:master.

chopraanmol1 commented 6 years ago

@tgturner Some changes in this PR are very low-level optimization but since they are in the critical path they do make difference. If you have a better approach for those optimizations let me know.

Also don't just merge yet I'll like to test Excelx::Cell::DateTime#create_datetime a bit more to ensure that it does not break on some edge case

tgturner commented 6 years ago

I'll give this a look over today!

chopraanmol1 commented 6 years ago

@tgturner I've rebased with master as most recent commit had a conflict

chopraanmol1 commented 6 years ago

@tgturner I've added relevant test case for timezone offset change overlap bug. On master this test case will fail for ENV['TZ'] = "Asia/Calcutta". New implementation of Excelx::Cell::DateTime#create_datetime handles this edge case properly. I think we can go ahead with this now.

tgturner commented 6 years ago

LGTM!