rs-station / reciprocalspaceship

Tools for exploring reciprocal space
https://rs-station.github.io/reciprocalspaceship/
MIT License
28 stars 12 forks source link

Add `DataSet.select_mtzdtype()` to subset `DataSet` by MTZ column type #150

Closed JBGreisman closed 2 years ago

JBGreisman commented 2 years ago

This PR adds a new function to DataSet, select_mtzdtype():

In [1]: rs.DataSet.select_mtzdtype?
Signature: rs.DataSet.select_mtzdtype(self, dtype)
Docstring:
Return the subset of the DataSet’s columns that are of the given dtype.

Parameters
----------
dtype : str or instance of MTZDtype class
    Single-letter MTZ code, name, or MTZDtype class to return

Returns
-------
DataSet
    Subset of the DataSet with columns matching the requested dtype. If
    no columns of the requested dtype are found, an empty DataSet is
    returned.

Raises
------
ValueError
    If `dtype` is not a string nor a MTZDtype class
File:      ~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dataset.py
Type:      function

This PR fixes #104, making it easier to subset a DataSet to columns of a desired MTZ column type.

codecov-commenter commented 2 years ago

Codecov Report

Merging #150 (7889081) into main (f7d2a2c) will decrease coverage by 0.00%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #150      +/-   ##
==========================================
- Coverage   98.46%   98.46%   -0.01%     
==========================================
  Files          43       43              
  Lines        1696     1690       -6     
==========================================
- Hits         1670     1664       -6     
  Misses         26       26              
Flag Coverage Δ
unittests 98.46% <100.00%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reciprocalspaceship/dataset.py 98.04% <100.00%> (-0.03%) :arrow_down:
reciprocalspaceship/concat.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f7d2a2c...7889081. Read the comment docs.

JBGreisman commented 2 years ago

Small bug:

In [1]: mtz.head()
Out[1]: 
            BATCH         I      SIGI  PARTIAL
H   K   L                                     
19  -11 2    1424 2066.5464 32.449394    False
3   18  -1    328 2354.8054 29.814323    False
-17 21  17    576  20.76507 6.5568223    False
21  26  -7     70 23.690462 6.6323786    False
-9  17  11    798 459.89502 17.939333    False

In [2]: mtz.select_mtzdtype("B") # Select BatchDtype
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/commandline/mtzdump.py in <module>
----> 1 mtz.select_mtzdtype("B") # Select BatchDtype

~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dataset.py in select_mtzdtype(self, dtype)
    565             # One-letter code
    566             if len(dtype) == 1:
--> 567                 return self[[k for k in self if self[k].dtype.mtztype == dtype]]
    568             else:
    569                 return self[[k for k in self if self[k].dtype.name == dtype]]

~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dataset.py in <listcomp>(.0)
    565             # One-letter code
    566             if len(dtype) == 1:
--> 567                 return self[[k for k in self if self[k].dtype.mtztype == dtype]]
    568             else:
    569                 return self[[k for k in self if self[k].dtype.name == dtype]]

AttributeError: 'numpy.dtype[bool_]' object has no attribute 'mtztype'

This error occurs because only the custom MTZDtypes have the mtztype attribute, so this line fails for non-custom dtypes, such as the numpy bool dtype here.

# One-letter code
if len(dtype) == 1:
    return self[[k for k in self if self[k].dtype.mtztype == dtype]]

This can be fixed by first checking whether the mtztype attribute exists before checking its value:

# One-letter code
if len(dtype) == 1:
    return self[[k for k in self if hasattr(self[k].dtype, "mtztype") and self[k].dtype.mtztype == dtype]]
JBGreisman commented 2 years ago

I updated the test to parametrize calls to reset_index(). This allows for HKLIndexDtypes to be caught by the test. I agree this makes the test more thorough.

Once CI passes, I'll merge this PR in.