ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
225 stars 75 forks source link

Add `pair_rc()` to improve row/column pair lookups #468

Closed StevenHibble-Concurrence closed 1 month ago

StevenHibble-Concurrence commented 5 months ago

Closes #467.

I considered Szudzik's Elegant Pairing Function, but it quickly exceeds .Machine$integer.max. Instead, we create a double vector by taking the row position as the whole number and the column position as the decimal number. This assumes that float precision is better than 1e-7 and that the spreadsheet won't have more than 10M-1 columns.

(1, 1)     ->   1.0000001
(999, 999) -> 999.0000999
StevenHibble-Concurrence commented 5 months ago

I forgot to add some metrics. I took timing and total memory allocation using bench::mark(). Peak RAM usage is from peakRAM::peakRAM().

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

workbook size | metric | paste() | pair_rc() | change -- | -- | -- | -- | -- medium | time (seconds) | 7.2 | 6.4 | -11% medium | mem_alloc (MB) | 399.8 | 355.7 | -11% medium | peak_mem (MB) | 283.7 | 263.3 | -7% large | time (seconds) | 46.4 | 33.1 | -29% large | mem_alloc (MB) | 3696.6 | 3184.6 | -14% large | peak_mem (MB) | 449.5 | 395.2 | -12%

JanMarvin commented 1 month ago

Thanks!