randym / axlsx

xlsx generation with charts, images, automated column width, customizable styles and full schema validation. Axlsx excels at helping you generate beautiful Office Open XML Spreadsheet documents without having to understand the entire ECMA specification. Check out the README for some examples of how easy it is. Best of all, you can validate your xlsx file before serialization so you know for sure that anything generated is going to load on your client's machine.
MIT License
2.62k stars 695 forks source link

Memory optimization (~30% less) #633

Open korun opened 5 years ago

korun commented 5 years ago

Hi, this is tiny optimization without complete refactoring or something.

The main idea: decrease count of allocated objects (mostly strings) by some ruby tricks and freezing.

Modern ruby versions allow not only just prevent strings from mutation with freeze, but also allocate it only once:

# ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
2.2.2 :001 > 'asdads332'.freeze.object_id
# => 18294380 
2.2.2 :002 > 'asdads332'.freeze.object_id
# => 18294380 
2.2.2 :003 > 'asdads332'.freeze.object_id
# => 18294380 

If we need to support old ruby versions -- I suggest to consider it in a separate request.

Some measurements here: https://gist.github.com/korun/1f22ee8505122f2f307fc83dee3b9974


Refs #276

gdimitris commented 5 years ago

That looks very promising! Any memory relief is highly appreciated since this gem uses a lot of memory and causes memory issues when generating large excel files. @randym is it possible to check for any regressions and create a new release ? There is a pending memory issue since January 2014 https://github.com/randym/axlsx/issues/276

Thanks !

gdimitris commented 5 years ago

@korun do you perhaps have any more tips/hints to reduce memory usage ? I am getting a lot of memory quota exceeded hits on Heroku on my project and face a lot of issues

korun commented 5 years ago

@gdimitris, I think it can be implemented some "stream-mode", so axlsx will can write prepared data direct in temp file row-by-row. But it forbid some exists features, such as modifying row height, or styles, etc. after sheet.add_row was called. In this mode, all preparations must be made before row insertion.

gdimitris commented 5 years ago

@korun I am a bit confused, so you mean that streaming the rows would break existing styles in cells ? Or that they have to be applied differently ?

korun commented 5 years ago

@gdimitris, for example, now we can do something like:

cells = sheet.add_row(
    ['Total', "Count: #{count}", *Array.new(6), "=SUM(I#{i}:I#{i + count})", nil, nil],
    style: row_table_cell_style(sheet)
)
cells[0].style = row_text_center_table_style(sheet)
cells[8].style = row_currency_style(sheet)

In "stream-mode" we have to do all preparations before row flush to file:

# flush to file enabled
cells = sheet.add_row(
    ['Total', "Count: #{count}", *Array.new(6), "=SUM(I#{i}:I#{i + count})", nil, nil],
    style: [
        row_text_center_table_style(sheet),
        *Array.new(7, row_table_cell_style(sheet)]),
        row_currency_style(sheet)
    ]
)
# at this line, all is good

# But, if we call:
cells[0].style = row_text_center_table_style(sheet)
# it raise some error, like "Cannot access to flushed row in stream-mode"
gdimitris commented 5 years ago

@korun Ok got it, thanks! Do you believe implementing "streaming mode" is something that requires a lot of effort in terms of development ? If you are interested, we can collaborate and implement "streaming mode" together, since I do not see any recent activity from the creator of the gem. My guess is that a lot of people have memory issues with the gem and this would help a lot. Let me know if you are interested :D I believe we can add it as a feature so it does not break existing functionality.

klyonrad commented 5 years ago

@korun did you notice the "movement" that got started at https://github.com/caxlsx/caxlsx ? Can you submit the PR there as well, that could spike some big interest there :)

gdimitris commented 5 years ago

@klyonrad korun doesn't seem active but I can do this I think later in the day. It seems a very promising fix.

gdimitris commented 5 years ago

Turns out it was easier than I thought 😄