Closed minhuanli closed 1 year ago
I like the concept -- I agree that cif
format can and should be supported because it is already handled by gemmi
. Regarding multi-dataset formats, a similar issue crops up with mtz
files. Unmerged MTZs are currently handled using just a BATCH identifier, but technically the MTZ format supports things like individual unit cell parameters for different batches (particularly useful with serial data).
We do not currently handle this to the full extent that we could, and full support would likely require a new class as you said. I'm not yet sure how to do this in a way that still maintains the feel of pandas in a clean way.
i think we can easily support cif
just as @minhuanli demonstrated. honestly, writing the test is the hardest part of that PR.
the multi-dataset object is a tricky one. there's probably a decent enough way to implement it, but it is not immediately obvious to me.
cif
support was added in #217
Make a record here in case we soon forget about it in the future.
Yesterday I had a discussion with Kevin @kmdalton about how to read structure factor data in
cif
format. And turns out it can be done withcif
parser fromgemmi
(although not very well documented there).Mostly Kevin's wisdom:
The above can be easily organized into a function like
rs.read_cif()
.One thing is that with possible multi datasets
cif
file, to be general, we have to think about the return as a representation of multi datasets. Maybe a generator?rs.read_cif(cif: str) -> generator
? Or move a step forward, we could implement a new class likeDataSetCollection
with methods to deal with multi datasets? A lot of decisions to make here.