rcsb / mmtf-java

The java implementation of the MMTF API, decoder and encoder.
http://mmtf.rcsb.org/
Apache License 2.0
11 stars 10 forks source link

`ncsOperatorList` read as doubles - mmtf spec says should be floats. #53

Open zacharyrs opened 3 years ago

zacharyrs commented 3 years ago

When unpacking an mmtf file, this implementation expects doubles for the transformation matrices. The specification outlines the float type as 32bit, and says this field is populated with floats. Not sure if this should be changed - I suspect it might break parsing existing mmtf files, so maybe it needs to accept both types?

zacharyrs commented 3 years ago

Unfortunately this is breaking cross compatibility with mmtf-python, which by default dumps everything as 64bit floats (doubles). The msgpack-python implementation doesn't support packing a particular field (the transforms) as 64bit floats, and everything else as 32bit floats - see here.

zacharyrs commented 3 years ago

I have a partial workaround, by making mmtf-python follow the same decisions as here (all 32bit except the transforms list) - https://github.com/rcsb/mmtf-python/issues/50.

josemduarte commented 3 years ago

Good catch @zacharyrs ! Thanks for the detailed report.

Changing the RCSB mmtf files is doable but as you say may cause quite some trouble. I like your python workaround as a solution. However, to be consistent the spec would have to officially acknowledge that ncsOperList uses doubles, right?

One important note. MMTF is now is in minimal maintenance mode. The preferred compressed format for PDB data is BinaryCIF.

zacharyrs commented 3 years ago

Thanks @josemduarte!

Yes, the python solution basically just means both implementations violate the specification in the same way. It avoids the hassle of breaking things.

I didn't realise mmtf had been dropped to maintenance... I assume BinaryCIF follows the CIF spec, it's just encoded?

I recall CIF not caring about bond information, which was what I liked about mmtf - I guess I'll have to read into it more.

josemduarte commented 3 years ago

I assume BinaryCIF follows the CIF spec, it's just encoded?

Yes, that's correct

I recall CIF not caring about bond information, which was what I liked about mmtf - I guess I'll have to read into it more.

Bond information is available but indirectly via the chemical component dictionary

zacharyrs commented 3 years ago

Bond information is available but indirectly via the chemical component dictionary

Is that guaranteed for all molecules or is it optional?

josemduarte commented 3 years ago

The chemical component dictionary contains all intra-residue bond information. But it is not embedded within the structure BCIF files. We will consider offering the whole chemical component dictionary as one BCIF bundle that should make it more convenient to use.