rs-station / reciprocalspaceship

Tools for exploring reciprocal space
https://rs-station.github.io/reciprocalspaceship/
MIT License
28 stars 11 forks source link

Is numeric na #259

Closed kmdalton closed 2 weeks ago

kmdalton commented 2 weeks ago

This PR speeds up dataset creation for large arrays. In working on a new stream file parser, I realized that converting numpy float32 and int32 to MTZDtypes could be very time-consuming for large arrays. After some digging, I found out the culprit was a cython function provided by pandas, pandas._libs.missing.is_numeric_na which makes a mask for missing values in rs

In the case that the input is an int32 or float32 numpy array, this is wholly unnecessary, and it is much faster to use np.isnan to accomplish the same task. This PR just wraps is_numeric_na and adds some control flow to accomplish that. It falls back to the Cython version whenever the input is not an int32 or float32 ndarray. This is a very conservative choice, and more circumstances could probably be included in the control flow down the line.

kmdalton commented 2 weeks ago

This should wait to merge until after #258

kmdalton commented 2 weeks ago

Okay, I cleaned up this PR. It should be good now. I will merge after the CI runs.