Open TAdeJong opened 4 months ago
OK, this is more complicated than I thought; This version breaks does not work for some output I have. WIP.
This now handles empty columns, however, there might be more cases to consider that I do not know of.
Thanks @TAdeJong, this sounds beneficial. @amcz do you have any initial thoughts?
I ran into another edge case and fixed that. By peeling out some of the logic out of the inner loop I got another factor of ~2 speedup. Looking at a line profile, a lot of time is still spend in xarray merging and pandas indexing. I suspect another order of magnitude could be won by pre-allocating xr.DataArray's and indexing the underlying arrays directly while reading records, but that would require a more major rewrite.
Thanks! the reader could use some improvements and I appreciate this work on it. I will have time to review it and pull into hysplit development branch in sometime in beginning or mid August.
For my use case of reading relatively large hysplit concentration grids, I found
readfile
to be much slower than the hysplit simulation itself. Looking at the code, 93% of the time was spend on iterative calls toxr.merge
. I had to debug a bit, because it seems that, at least for my output of hysplit v5.2.2, the species and levels were flipped in order. Building a list of lists and calling xr.merge and xr.concat yielded a significant speedup of roughly 15x, very worthwhile for me.Tests are passing, but tests are not actually testing this part of the code.
BTW: I think further speed up would be possible by lifting the conversion to a pandas dataframe out of the innermost function.