Open neishm opened 5 years ago
Would you mind making a pull request for this. Thanks!
Pulled into rpnpy_2.1-fstfast-branch, will need some more time to review this one before merging into main dev branch.
I would consider spliting this one into 3 files...
vectorize could be used elsewhere and would probably be best by itself "hack-ish" functions, relying on librmn internal structure knowledge... this one I would not import in "utils.all" other functions In any case, thanks for these optimizations.
I agree some of these should not go into "utils.all". Splitting them sounds like a good idea.
I have some optimized routines from the
fstd2nc
tool which I've found helpful when scanning through many (hundreds) of files on a routine basis, where even a small overhead can add up to a noticeable delay in the script. Maybe some of them might be useful withinpython-rpn
as utility functions? Below is a brief description of them.all_params
The all_params method extracts all the record parameters at once, and returns a vectorized dictionary of the result. It scrapes the information directly out of the librmn data structures, and avoids the overhead of repeatedly calling
fstprm
. Example:This can be combined with
pandas
to get a convenient table of parameters:Having the parameters in this
pandas.DataFrame
structure provides a more powerful tool for analysing the data. For instance, the pivot method could be used to quickly organize the records into multidimensional time/level structures. The method is also about 20x faster than looping overfstprm
:80ms isn't much, but it adds up if you're scanning over hundreds (or thousands) of files.
maybeFST
The maybeFST function is a more compact version of
isFST
which avoids any librmn calls such asc_wkoffit
. That library function can incur some overhead since it's testing for many different formats, not just FST.When combined together, the
maybeFST
andall_params
functions can allow a user to very quickly scan over many hundreds of files and get a snapshot of all the records inside.stamp2datetime
The stamp2datetime function converts an array of RPN date stamps into
datetime
objects. Useful in conjunction withall_params
to get date information quickly, e.g.:decode_ip1
The decode_ip1 function quickly decodes the levels from an array of ip1 values. For example:
Example
For completion, here's an example using the information from all the above steps to get multidimensional structures for a field. First, get the list of variables:
Pick UU:
Organize the records by date/level: