Closed rfiorella closed 1 year ago
@rfiorella Thanks for the heads up about this! This is very timely for me, a couple of thoughts:
Part of what's tricky with stackEddy() is that it involves joining as well as stacking, but I can definitely take a look at where it might be made more efficient. I'll keep you posted.
Thanks @cklunch!
Not sure if this is useful or new information - but I did some memory profiling on a workflow that I noticed this issue in, and seems like the base::merge call is a good target here
This issue will be resolved with the release of neonUtilities=2.3.0, though a few changes will need to be made to NEONiso functions to make use of the new capabilities (e.g., add use_fasttime arguments).
Files that need to be updated:
Thanks @cklunch for working on stackEddy performance with me!
performance using neonUtilities::stackEddy really suffers (at least for me!) when called repeatedly on large datasets, as in NEONiso.
I suspect this is due to robust error handling or difficulties in preallocating arrays in stackEddy. This may not be necessary for the specific, well defined use case it is for in NEONiso if there are faster alternatives. It may also be a fundamental limitation of h5read, which is slow.
Two potential paths forward: 1) replace stackEddy with a faster (more dangerous?) method 2) find ways to optimize stackEddy when used on really large datasets.
Any suggestions @cklunch @ddurden @ndurden @cflorian?