This PR speeds up dataset creation for large arrays. In working on a new stream file parser, I realized that converting numpy float32 and int32 to MTZDtypes could be very time-consuming for large arrays. After some digging, I found out the culprit was a cython function provided by pandas, pandas._libs.missing.is_numeric_na which makes a mask for missing values in rs
In the case that the input is an int32 or float32 numpy array, this is wholly unnecessary, and it is much faster to use np.isnan to accomplish the same task. This PR just wraps is_numeric_na and adds some control flow to accomplish that. It falls back to the Cython version whenever the input is not an int32 or float32 ndarray. This is a very conservative choice, and more circumstances could probably be included in the control flow down the line.
This PR speeds up dataset creation for large arrays. In working on a new stream file parser, I realized that converting numpy float32 and int32 to MTZDtypes could be very time-consuming for large arrays. After some digging, I found out the culprit was a cython function provided by pandas,
pandas._libs.missing.is_numeric_na
which makes a mask for missing values inrs
In the case that the input is an int32 or float32 numpy array, this is wholly unnecessary, and it is much faster to use
np.isnan
to accomplish the same task. This PR just wraps is_numeric_na and adds some control flow to accomplish that. It falls back to the Cython version whenever the input is not an int32 or float32 ndarray. This is a very conservative choice, and more circumstances could probably be included in the control flow down the line.