rs-station / reciprocalspaceship

Tools for exploring reciprocal space
https://rs-station.github.io/reciprocalspaceship/
MIT License
28 stars 12 forks source link

Ensure consistent column ordering in `DataSet.unstack_anomalous()` #99

Closed JBGreisman closed 2 years ago

JBGreisman commented 2 years ago

DataSet.unstack_anomalous() would sometimes produce new two-column anomalous DataSets in which the newly created anomalous columns were in different orders. This would occur because internally the function used a set to keep track of the new columns. Since the plus-Friedel columns were suffixed with (+) and the minus-Friedel with (-), this could produce different orders to the columns in the set.

This bug made it so that round-trip operations of DataSet.unstack_anomalous() followed by DataSet.stack_anomalous() sometimes produced a ValueError (#97 ), depending on the Python version being used.

This PR updates DataSet.unstack_anomalous() to internally use a dict for keeping track of the relevant columns. As of Python 3.7+ this preserves the order of input keys.

codecov-commenter commented 2 years ago

Codecov Report

Merging #99 (6a6120b) into main (02f95bc) will not change coverage. The diff coverage is 85.71%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main      #99   +/-   ##
=======================================
  Coverage   98.97%   98.97%           
=======================================
  Files          41       41           
  Lines        1561     1561           
=======================================
  Hits         1545     1545           
  Misses         16       16           
Flag Coverage Δ
unittests 98.97% <85.71%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reciprocalspaceship/dataset.py 99.01% <85.71%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 02f95bc...6a6120b. Read the comment docs.

kmdalton commented 2 years ago

I checked this branch and it appears to fix the issues i was having with careless output. :+1:

dennisbrookner commented 2 years ago

Related to this (and hopefully not compounding the issue) would it make sense for DataSet.stack_anomalous() to take something like plus_suffix and minus_suffix arguments, as alternatives for plus_labels and minus_labels?

Definitely not critical, but would a) occasionally save some typing and b) be internally consistent with the way the defaults work (without breaking previous code), e.g. plus_suffix="(+)" / minus_suffix="(-)"

(This also seems like an easy enough change that I could try to tackle it myself, with blessing?)

kmdalton commented 2 years ago

@dennisbrookner 's suggestion is more consistent with pandas. I think I am in favor of it.

JBGreisman commented 2 years ago

in the future, let's do this sort of discussion in a separate Issue rather than on a closed PR.

I'm fine with adding something like this. Given that unstack_anomalous() takes suffixes=("(+)", "(-)") as an argument, I think it makes sense to support alternate suffixes in stack_anomalous(). I think it also makes sense to generalize the search criteria to columns containing "+" or "-" in stack_anomalous() which would inherently cover either case.

dennisbrookner commented 2 years ago

Noted re: locations of conversations, I'll make a new Issue and link to here. Glad this seems to make sense!