rs-station / reciprocalspaceship

Tools for exploring reciprocal space
https://rs-station.github.io/reciprocalspaceship/
MIT License
28 stars 11 forks source link

`stack_anomalous` inside `groupby` breaks `as_index=False` #192

Open kmdalton opened 1 year ago

kmdalton commented 1 year ago

When called inside a groupby.apply context, stack_anomalous overrides the as_index=False setting and appends the grouping column to the index of the returned dataset with name None

import numpy as np
import reciprocalspaceship as rs

cell = [34., 45., 98., 90., 90., 90.]
spacegroup = 19
dmin = 4.
repeats = 1

h,k,l = rs.utils.generate_reciprocal_asu(cell, spacegroup, dmin, anomalous=True).T

ds = None
for i in range(repeats):
    _ds = rs.DataSet({
        "H" : h,
        "K" : k,
        "L" : l,
        "I" : np.random.random(len(h)),
        "SIGI" : np.random.random(len(h)),
    }, cell=cell, spacegroup=spacegroup, merged=True).infer_mtz_dtypes()
    _ds['repeat'] = i
    if ds is not None:
        ds = rs.concat((ds, _ds))
    else:
        ds = _ds

ds = ds.set_index(['H', 'K', 'L'])

print(f"Before: {ds.index}")

# Somehow calling `stack_anomalous` overides `as_index=False`
result = ds.groupby('repeat', as_index=False).apply(lambda x: x.stack_anomalous())

print(f"After: {result.index}")

which gives the following output:

Before: MultiIndex([(-8, -3, -5),
            (-8, -3, -4),
            (-8, -3, -3),
            (-8, -3, -2),
            (-8, -3, -1),
            (-8, -2, -7),
            (-8, -2, -6),
            (-8, -2, -5),
            (-8, -2, -4),
            (-8, -2, -3),
            ...
            ( 8,  2,  4),
            ( 8,  2,  5),
            ( 8,  2,  6),
            ( 8,  2,  7),
            ( 8,  3,  0),
            ( 8,  3,  1),
            ( 8,  3,  2),
            ( 8,  3,  3),
            ( 8,  3,  4),
            ( 8,  3,  5)],
           names=['H', 'K', 'L'], length=24470)
After: MultiIndex([(0, -8, -3, -5),
            (0, -8, -3, -4),
            (0, -8, -3, -3),
            (0, -8, -3, -2),
            (0, -8, -3, -1),
            (0, -8, -2, -7),
            (0, -8, -2, -6),
            (0, -8, -2, -5),
            (0, -8, -2, -4),
            (0, -8, -2, -3),
            ...
            (9, -8, -2, -3),
            (9, -8, -2, -4),
            (9, -8, -2, -5),
            (9, -8, -2, -6),
            (9, -8, -2, -7),
            (9, -8, -3, -1),
            (9, -8, -3, -2),
            (9, -8, -3, -3),
            (9, -8, -3, -4),
            (9, -8, -3, -5)],
           names=[None, 'H', 'K', 'L'], length=44610)

The repeat column still persists in the result dataset.

kmdalton commented 1 year ago

is this a bug? maybe this is just what pandas does when groupby.apply returns a different length dataframe?