rs-station / reciprocalspaceship

Tools for exploring reciprocal space
https://rs-station.github.io/reciprocalspaceship/
MIT License
28 stars 11 forks source link

groupby apply drops cell and spacegroup #191

Open kmdalton opened 1 year ago

kmdalton commented 1 year ago

Installing rs from pypi with the following:

conda create -n rs python=3.10
pip install --upgrade pip
pip install reciprocalspaceship

I have rs=0.10.3 and pandas=1.4.4.

import numpy as np
import reciprocalspaceship as rs

cell = [34., 45., 98., 90., 90., 90.]
spacegroup = 19
dmin = 4.
repeats = 1

h,k,l = rs.utils.generate_reciprocal_asu(cell, spacegroup, dmin, anomalous=True).T

ds = None
for i in range(repeats):
    _ds = rs.DataSet({
        "H" : h,
        "K" : k,
        "L" : l,
        "I" : np.random.random(len(h)),
        "SIGI" : np.random.random(len(h)),
    }, cell=cell, spacegroup=spacegroup, merged=True).infer_mtz_dtypes()
    _ds['repeat'] = i
    if ds is not None:
        ds = rs.concat((ds, _ds))
    else:
        ds = _ds

ds = ds.set_index(['H', 'K', 'L'])

print(f"Before: {ds.spacegroup} ; {ds.cell}")

# This line will drop the spacegroup & cell
ds = ds.groupby('repeat', as_index=False).apply(lambda x: x)

print(f"After: {ds.spacegroup} ; {ds.cell}")

Output:

$ python test.py
Before: <gemmi.SpaceGroup("P 21 21 21")> ; <gemmi.UnitCell(34, 45, 98, 90, 90, 90)>
After: None ; None
JBGreisman commented 1 year ago

This feels like an issue with groupby-apply in pandas -- it's possible that the _metadata fields aren't being propagated to the result. This works as expected with the built in methods (groupby().mean(), etc). This will take a bit to create a pandas minimal example to confirm that it happens there (and not because of something in rs).

kmdalton commented 1 year ago

makes sense. i did not try very hard to minimize this example. just wanted to document it before i forgot.