randym / axlsx

xlsx generation with charts, images, automated column width, customizable styles and full schema validation. Axlsx excels at helping you generate beautiful Office Open XML Spreadsheet documents without having to understand the entire ECMA specification. Check out the README for some examples of how easy it is. Best of all, you can validate your xlsx file before serialization so you know for sure that anything generated is going to load on your client's machine.
MIT License
2.62k stars 695 forks source link

Axlsx performance #61

Closed jurriaan closed 12 years ago

jurriaan commented 12 years ago

Just did a quick benchmark to see how well axlsx performs.. Plain csv is 40 times faster than standard axlsx.. I think there's room for improvement ;) Streaming is also faster than Package#serialize?

Results:

                           user     system      total        real
axlsx_noautowidth          7.730000   0.110000   7.840000 (  8.188405)
axlsx                      12.640000   0.260000  12.900000 ( 13.422525)
axlsx_shared               38.700000   0.300000  39.000000 ( 39.085976)
axlsx_stream               11.380000   0.180000  11.560000 ( 11.591120)
csv                        0.270000   0.020000   0.290000 (  0.336972)

Code:


#!/usr/bin/env ruby -s
# -*- coding: utf-8 -*-
$:.unshift "#{File.dirname(__FILE__)}/../lib"
require 'axlsx'
require 'csv'

require 'benchmark'
row = []
input = (32..126).to_a.pack('U*').chars.to_a
20.times { row << input.shuffle.join}
times = 1000
Benchmark.bm(100) do |x|
  x.report('axlsx_noautowidth') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do 
        sheet << row
      end
    end
    p.use_autowidth = false
    p.serialize("example.xlsx")
  }
  x.report('axlsx') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do 
        sheet << row
      end
    end
    p.serialize("example.xlsx")
  }

  x.report('axlsx_shared') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do 
        sheet << row
      end
    end
    p.use_shared_strings = true
    p.serialize("example.xlsx")
  }

  x.report('axlsx_stream') {
    p = Axlsx::Package.new
    wb = p.workbook

    #A Simple Workbook

    wb.add_worksheet do |sheet|
      times.times do
        sheet << row
      end
    end

    s = p.to_stream()
    File.open('example_streamed.xlsx', 'w') { |f| f.write(s.read) }
  }
  x.report('csv') {
    CSV.open("example.csv", "wb") do |csv|
      times.times do 
        csv << row
      end
    end
  }
end
randym commented 12 years ago

Fast enough?

                              user     system      total        real
axlsx_noautowidth         1.560000   0.030000   1.590000 (  1.717595)
axlsx                     4.360000   0.140000   4.500000 (  5.748329)
axlsx_shared              6.880000   0.160000   7.040000 (  9.325648)
axlsx_stream              4.320000   0.120000   4.440000 (  5.642124)
csv                       0.240000   0.010000   0.250000 (  0.301004)
jurriaan commented 12 years ago

It's much better! ;) I don't understand why stream is faster every time, but the results are too close to say streaming is faster right now ;). If I've time I'll look how I can improve the performance. 18 times slower than csv is still not very impressive ;) Maybe it's possible to reduce the time spend on auto width.. I'll take a look at the perftools.rb output :)

randym commented 12 years ago

Comparing OOXML to CSV is a bit unrealistic I think. They are simply different domains. I wonder how this looks against something like odf-reports?

That said - If we can get this running something close to "ten times slower" than something as trivial as CSV - well it would be one hell of an accomplishment.

jurriaan commented 12 years ago

That's true, comparing to CSV is somewhat unrealistic. But, waiting 6 seconds for generating a 1000 row xlsx is still too much imho. ;) We went from 40x slower to 18x slower. It should be possible to get close to 10x slower. I think Excel is much faster.

We should also add more XLS(X)/ODF report generators to the benchmark for a better comparison.

jurriaan commented 12 years ago

@randym See commit b6843a2ae053 for some cleanup I did.

randym commented 12 years ago

After re-writing autowidth to implement TC_COL properly:

                  user     system      total        real
axlsx_noautowidth 0.810000   0.020000   0.830000 (  0.836274)
axlsx             1.430000   0.160000   1.590000 (  1.776305)
axlsx_shared      9.360000   0.160000   9.520000 (  9.662113)
axlsx_stream      1.320000   0.110000   1.430000 (  1.429806)
csv               0.260000   0.020000   0.280000 (  0.296828)

However - this means if you apply styles after creating rows (e.g. cell.sz = 100) the width will not automatically update.

I'd say 5 times slower than CSV is pretty damn fast! Should we close this for now? Or do we want to keep trying to make this faster?

jurriaan commented 12 years ago

It's ok to close this, but on my system axlsx is still ~15 times slower

joekain commented 12 years ago

With the change in Pull request #72 I see the following improvement in performance:

master:
                           user     system      total        real
axlsx_noautowidth      1.250000   0.110000   1.360000 (  1.355938)
axlsx                  1.260000   0.120000   1.380000 (  1.383181)
axlsx_shared           9.700000   0.190000   9.890000 (  9.890973)
axlsx_stream           1.080000   0.110000   1.190000 (  1.192197)
csv                    0.200000   0.010000   0.210000 (  0.217332)

                       user     system      total        real
axlsx_noautowidth      1.260000   0.120000   1.380000 (  1.374209)
axlsx                  1.290000   0.120000   1.410000 (  1.422443)
axlsx_shared           9.730000   0.190000   9.920000 (  9.923795)
axlsx_stream           1.080000   0.120000   1.200000 (  1.205190)
csv                    0.200000   0.010000   0.210000 (  0.206190)

                           user     system      total        real
axlsx_noautowidth      1.260000   0.130000   1.390000 (  1.390412)
axlsx                  1.270000   0.120000   1.390000 (  1.391234)
axlsx_shared           9.800000   0.200000  10.000000 (  9.998483)
axlsx_stream           1.090000   0.110000   1.200000 (  1.204623)
csv                    0.180000   0.010000   0.190000 (  0.191816)

Pull request #72
                           user     system      total        real
axlsx_noautowidth      1.250000   0.110000   1.360000 (  1.368082)
axlsx                  1.290000   0.120000   1.410000 (  1.406685)
axlsx_shared           8.480000   0.180000   8.660000 (  8.668626)
axlsx_stream           1.120000   0.110000   1.230000 (  1.238980)
csv                    0.190000   0.010000   0.200000 (  0.195624)

                           user     system      total        real
axlsx_noautowidth      1.250000   0.110000   1.360000 (  1.365485)
axlsx                  1.280000   0.120000   1.400000 (  1.391993)
axlsx_shared           8.550000   0.200000   8.750000 (  8.763224)
axlsx_stream           1.090000   0.120000   1.210000 (  1.206772)
csv                    0.190000   0.010000   0.200000 (  0.208227)

                           user     system      total        real
axlsx_noautowidth      1.310000   0.130000   1.440000 (  1.443642)
axlsx                  1.300000   0.120000   1.420000 (  1.423780)
axlsx_shared           8.490000   0.190000   8.680000 (  8.680640)
axlsx_stream           1.170000   0.120000   1.290000 (  1.295424)
csv                    0.190000   0.010000   0.200000 (  0.196803)
randym commented 12 years ago

Ill be pushing a large change set today to bring support for JRuby online and a few other performance patches as well in preparation for the release tonight.

randym commented 12 years ago

Where we stand for 1.1.0

                          user     system      total        real
axlsx_noautowidth     1.160000   0.030000   1.190000 (  1.202237)
axlsx                 1.710000   0.150000   1.860000 (  2.298028)
axlsx_shared          2.650000   0.150000   2.800000 (  2.864474)
axlsx_stream          1.650000   0.140000   1.790000 (  1.848033)
csv                   0.270000   0.020000   0.290000 (  0.291206)
daveed commented 12 years ago

Hi,

Is the support for JRuby pushed yet? Released version is still 1.0.18 and not 1.1.0... Cheers.

randym commented 12 years ago

@mazhout sorry mate - lots of firefighting at the day job since last week. Jruby (And Rubinius) support is up and active on master, I just have not released an official gem to rubygems yet.

Time to get that done!

randym commented 12 years ago

releasing 1.1.0 now

jurriaan commented 12 years ago

:thumbsup:

daveed commented 12 years ago

Cheers mate!

So basically if I just jruby -S gem install axlsx it should just work?

Coz I have this error because of RMagick with C dependencies... :(

Any help would be awesome :)

Thanks,

On Apr 3, 2012, at 8:14 PM, Jurriaan Pruisreply@reply.github.com wrote:

:thumbsup:


Reply to this email directly or view it on GitHub: https://github.com/randym/axlsx/issues/61#issuecomment-4899718

randym commented 12 years ago

@mazhout

Can you give us the error info along with your system details? I would guess without any information - that you need to install Image Magic or some other dependency.

daveed commented 12 years ago

@randym

System details:

mazhout:sb/ (master✗) $ uname -a 
Linux mazhout 2.6.32-21-generic #32-Ubuntu SMP Fri Apr 16 08:10:02 UTC 2010 i686 GNU/Linux

mazhout:sb/ (master✗) $ jruby -v
jruby 1.6.7 (ruby-1.8.7-p357) (2012-02-22 3e82bc8) (OpenJDK Server VM 1.6.0_20) [linux-i386-java]

mazhout:sb/ (master✗) $ jruby -S gem list rmagick

*** LOCAL GEMS ***

rmagick (2.13.1)
rmagick4j (0.3.7)

Image Magic is installed. but I still have the following error using the console:

>> require 'axlsx'
=> []
>> 
?> p = Axlsx::Package.new
=> #<Axlsx::Package:0x1d5c103 @app=#<Axlsx::App:0xd95e69>, @workbook=nil, @core=#<Axlsx::Core:0xc569fd @creator="axlsx">>
>> wb = p.workbook
=> #<Axlsx::Workbook:0x895972 @charts=#<Axlsx::SimpleTypedList:0x14f840f @locked_at=nil, @list=[], @allowed_types=[Axlsx::Chart], @serialize_as=nil>, @tables=#<Axlsx::SimpleTypedList:0x890a59 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Table], @serialize_as=nil>, @worksheets=#<Axlsx::SimpleTypedList:0x65d075 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Worksheet], @serialize_as=nil>, @images=#<Axlsx::SimpleTypedList:0x1edf119 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Pic], @serialize_as=nil>, @styles=#<Axlsx::Styles:0x1aaa07f @fills=#<Axlsx::SimpleTypedList:0x10f0424 @locked_at=2, @list=[#<Axlsx::Fill:0x1338621 @fill_type=#<Axlsx::PatternFill:0x1ab0c3a @patternType=:none>>, #<Axlsx::Fill:0xafc935 @fill_type=#<Axlsx::PatternFill:0xc2c549 @patternType=:gray125>>], @allowed_types=[Axlsx::Fill], @serialize_as="fills">, @tableStyles=#<Axlsx::TableStyles:0xc12ae3 @locked_at=0, @defaultTableStyle="TableStyleMedium9", @list=[], @serialize_as=nil, @allowed_types=[Axlsx::TableStyle], @defaultPivotStyle="PivotStyleLight16">, @cellStyles=#<Axlsx::SimpleTypedList:0x869739 @locked_at=1, @list=[#<Axlsx::CellStyle:0x14fa707 @builtinId=0, @name="Normal", @xfId=0>], @allowed_types=[Axlsx::CellStyle], @serialize_as="cellStyles">, @cellStyleXfs=#<Axlsx::SimpleTypedList:0x1c88831 @locked_at=1, @list=[#<Axlsx::Xf:0x18b628 @fontId=0, @fillId=0, @numFmtId=0, @borderId=0>], @allowed_types=[Axlsx::Xf], @serialize_as="cellStyleXfs">, @fonts=#<Axlsx::SimpleTypedList:0x1a9dbac @locked_at=1, @list=[#<Axlsx::Font:0x9e31f0 @name="Arial", @family=1, @sz=11>], @allowed_types=[Axlsx::Font], @serialize_as="fonts">, @numFmts=#<Axlsx::SimpleTypedList:0x275a2e @locked_at=2, @list=[#<Axlsx::NumFmt:0x15f0269 @numFmtId=100, @formatCode="yyyy/mm/dd">, #<Axlsx::NumFmt:0x976434 @numFmtId=101, @formatCode="yyyy/mm/dd hh:mm:ss">], @allowed_types=[Axlsx::NumFmt], @serialize_as="numFmts">, @cellXfs=#<Axlsx::SimpleTypedList:0x2fbdd3 @locked_at=3, @list=[#<Axlsx::Xf:0x15a95c3 @fontId=0, @fillId=0, @xfId=0, @numFmtId=0, @borderId=0>, #<Axlsx::Xf:0x1e54ad @fontId=0, @fillId=0, @xfId=0, @numFmtId=0, @borderId=1>, #<Axlsx::Xf:0x1492cfe @fontId=0, @applyNumberFormat=1, @fillId=0, @xfId=0, @numFmtId=14, @borderId=0>], @allowed_types=[Axlsx::Xf], @serialize_as="cellXfs">, @borders=#<Axlsx::SimpleTypedList:0xe489c6 @locked_at=2, @list=[#<Axlsx::Border:0x4e2837 @prs=#<Axlsx::SimpleTypedList:0x1326f8f @locked_at=nil, @list=[], @allowed_types=[Axlsx::BorderPr], @serialize_as=nil>>, #<Axlsx::Border:0x1fa6c48 @prs=#<Axlsx::SimpleTypedList:0x1daed02 @locked_at=nil, @list=[#<Axlsx::BorderPr:0x7fafa4 @style=:thin, @color=#<Axlsx::Color:0xe43d5e @rgb="FF000000">, @name=:left>, #<Axlsx::BorderPr:0x7909a7 @style=:thin, @color=#<Axlsx::Color:0x68e679 @rgb="FF000000">, @name=:right>, #<Axlsx::BorderPr:0xf3360c @style=:thin, @color=#<Axlsx::Color:0x1615685 @rgb="FF000000">, @name=:top>, #<Axlsx::BorderPr:0x1775e79 @style=:thin, @color=#<Axlsx::Color:0x1db9b4f @rgb="FF000000">, @name=:bottom>], @allowed_types=[Axlsx::BorderPr], @serialize_as=nil>>], @allowed_types=[Axlsx::Border], @serialize_as="borders">, @dxfs=#<Axlsx::SimpleTypedList:0x68d91 @locked_at=0, @list=[], @allowed_types=[Axlsx::Xf], @serialize_as="dxfs">>, @use_autowidth=true, @drawings=#<Axlsx::SimpleTypedList:0x73c8a9 @locked_at=nil, @list=[], @allowed_types=[Axlsx::Drawing], @serialize_as=nil>>
>> wb.add_worksheet(:name => "Basic Worksheet") do |sheet|
?>   sheet.add_row ["First Column", "Second", "Third"]
>>   sheet.add_row [1, 2, 3]
>> end
LoadError: load error: RMagick2 -- java.lang.UnsatisfiedLinkError: /usr/lib/jruby/lib/ruby/gems/1.8/gems/rmagick-2.13.1/lib/RMagick2.so: undefined symbol: rb_frame_last_func
from org/jruby/RubyKernel.java:1033:in `require'
from /usr/lib/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:547:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:546:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/rmagick-2.13.1/lib/RMagick.rb:11:in `(root)'
from org/jruby/RubyKernel.java:1033:in `require'
from /usr/lib/jruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:547:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:546:in `new_constants_in'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/activesupport-2.3.12/lib/active_support/dependencies.rb:182:in `require'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/rmagick-2.13.1/lib/RMagick.rb:109:in `initialize'
from /usr/lib/jruby/lib/ruby/gems/1.8/gems/axlsx-1.1.0/lib/axlsx/workbook/workbook.rb:156:in `add_worksheet'
from (irb):5:in `evaluate'
from org/jruby/RubyKernel.java:1083:in `eval'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:158:in `eval_input'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:271:in `signal_status'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:155:in `eval_input'
from org/jruby/RubyKernel.java:1410:in `loop'
from org/jruby/RubyKernel.java:1183:in `catch'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:154:in `eval_input'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:71:in `start'
from org/jruby/RubyKernel.java:1183:in `catch'
from /usr/lib/jruby/lib/ruby/1.8/irb.rb:70:in `start'
from /usr/lib/jruby/bin/jirb:13:in `(root)'>> 
randym commented 12 years ago

moved to: https://github.com/randym/axlsx/issues/78