Closed senhalil closed 2 years ago
As a demonstration, this commit https://github.com/senhalil/xsv/commit/d2c10c1dea6a227b74e6c23b50c7b7f42b987b87 makes symbols default without an argument.
I suspect that making symbols default wouldn't be the right thing since symbols are never released and this might be an issue for long running processes that parse xlsx documents. But for short running scripts/workers symbols would decrease memory allocation and speed up comparison.
@martijn I feel like it shouldn't be me who adds this argument (since I don't have the time to consider up/downs, nor the visibility of the user-base) but I thought the commit might simplify the life of the person who will actually implement the argument.
As a demonstration, this commit senhalil@d2c10c1 makes symbols default without an argument.
I suspect that making symbols default wouldn't be the right thing since symbols are never released and this might be an issue for long running processes that parse xlsx documents. But for short running scripts/workers symbols would decrease memory allocation and speed up comparison.
JFI, symbols are garbage collected since Ruby 2.2 - https://bugs.ruby-lang.org/issues/9634
I should have read the comments of the answer that I shared :facepalm: (I will edit the answer with your link cheers @texpert ) So then there is no downside... Let me know if you want the commit and I will create a PR.
I can see why symbolized keys would be useful or nicer in some Ruby code so I am fine with merging a proposal to make this an option (maybe as a parameter to symbolize_keys!
?).
There is no gain in allocations, though. Observe:
irb(main):001:0> x = Xsv::Workbook.open("test/files/10k-sheet.xlsx")
=> #<Xsv::Workbook:960>
irb(main):002:1* sheet = x.sheets[0]
=> #<Xsv::Sheet:980>
irb(main):004:0> sheet.parse_headers!
=> #<Xsv::Sheet:980>
irb(main):005:0> row0 = sheet[0]
=> {"A1"=>"A2", "This sheet has 10.000 rows"=>"This sheet has 10.000 rows", 1=>2, #<Date: 1992-10-04 ((2448900j,0s,0n),+0s,2299...
irb(main):006:0> row1 = sheet[1]
=> {"A1"=>"A3", "This sheet has 10.000 rows"=>"This sheet has 10.000 rows", 1=>3, #<Date: 1992-10-04 ((2448900j,0s,0n),+0s,2299...
irb(main):007:0> row0.keys.first.object_id
=> 1020
irb(main):008:0> row1.keys.first.object_id
=> 1020
For every row, the key objects are shared since we dup the @empty_row
for each new row.
Thanks @martijn, you are right! I tried with memory_profiler
and symbol and string both allocates and retains the same amount of memory and type of objects.
Total allocated: 252195568 bytes (860308 objects)
Total retained: 124963 bytes (1967 objects)
allocated memory by gem
-----------------------------------
205028515 xsv-1.0.1
47166973 rubyzip-2.3.0
80 other
It would fit well with other hash returning functions like JSON.parse and would be of use.
I am not sure but I suspect there would be less memory allocation since currently, each row allocates a new string key for the header labels in the
:hash
mode.Similar to https://github.com/martijn/xsv/issues/24 this could be done in two ways (or both? :thinking:) with a
symbolize_names: true
argument i) toopen
function and/or ii) toparse_headers!
function.