rs-station / reciprocalspaceship

Tools for exploring reciprocal space
https://rs-station.github.io/reciprocalspaceship/
MIT License
28 stars 12 forks source link

`DataSet.to_numpy()` should use numpy dtypes whenever possible #182

Closed JBGreisman closed 2 years ago

JBGreisman commented 2 years ago

Pandas DataFrames that contain ExtensionDtypes always default to output data with object dtype when DataFrame.to_numpy() is called. This is suboptimal for MTZ data, which by construction must be compatible with float32, and possibly int32.

This PR wraps the pandas call with DataSet.to_numpy() to assess whether a more sensible default (either float32 or int32) can be used based on the existing data. This should help to avoid cases where data is unnecessarily cast to an object array, which can lead to unexpected behavior downstream.

codecov-commenter commented 2 years ago

Codecov Report

Merging #182 (7c0264f) into main (4222ffc) will increase coverage by 0.01%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #182      +/-   ##
==========================================
+ Coverage   98.36%   98.37%   +0.01%     
==========================================
  Files          45       45              
  Lines        1772     1783      +11     
==========================================
+ Hits         1743     1754      +11     
  Misses         29       29              
Flag Coverage Δ
unittests 98.37% <100.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reciprocalspaceship/dataset.py 98.20% <100.00%> (+0.04%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.