Closed zacheryph closed 1 year ago
This also appears to be new to 2.6.2. I do not see this happening in 2.5.3 so I have rolled back our update.
How many records are on that sheet?
50K.
With this particular spreadsheet,
with Creek 2.5.3 memory usage (watching docker stats
) never exceeds ~ 450-500M
but with Creek 2.6.2 memory usage peaks at 1.1-1.2GB
We started seeing OOM killer issues in containerized workloads after upgrading to creek-2.6.2
as well and were able to alleviate the issue by downgrading to 2.5.3
. Thanks for the report @zacheryph!
I believe the issue was introduced as part of #101 in this commit: 494ed0505930ef2a70c26204bf8231228b6aa963
Specifically, it's repeatedly building "#{prefix}row"
, "#{prefix}c"
, "#{prefix}v"
, and "#{prefix}t"
strings for every node that is read. I think that instead of doing that, it needs to construct row_selector
, cell_selector
, value_selector
, and text_selector
strings only when prefix
is first set to a non-''
value (with those new variables defaulting to the non-prefixed names like 'row'
).
While we're micro-optimizing, I think we probably don't want [value_selector, text_selector].include?(node.name)
either, since that's constructing an array on every iteration. Directly checking node.name == value_selector || node.name == text_selector
would be more performant.
I am noticing excessive temporary memory allocation loading a worksheet. I cannot share said spreadsheet currently. This particular spreadsheet is 5.8MB, single sheet, 50K records. I cannot see anything obvious within
rows
causing this but it just seems a bit excessive for a 6MB file ;)Reason I bring this up is it's killing kubernetes pods for us for memory usage when running. I am attempting to bump the limits to get around it.
Anything I can do to help I'm glad to.
Note, memory usage is from Docker stats
Base Rails Console Memory: 168.3MiB Load Book (
book = Creek::Book.new("report.xlsx")
): 305.8MiB Force a GC (GC.start
): 210.6MiB Lookup First Row (book.sheets.first.rows.first
): 1.247GiB Force a GC (GC.start
): 222.9MiB