rcsb / mmtf

The specification of the MMTF format for biological structures
http://mmtf.rcsb.org/
44 stars 17 forks source link

Clarify if bondAtomList and bondOrderList are optional for groups too #33

Closed gtauriello closed 6 years ago

gtauriello commented 6 years ago

It would be great to clarify in the specs if it is acceptable for an MMTF file to have groupType objects (see https://github.com/rcsb/mmtf/blob/master/spec.md#grouplist) with missing bondAtomList and bondOrderList fields.

The reason why this needs clarification is that the bondAtomList and bondOrderList fields exist also on a global level (see https://github.com/rcsb/mmtf/blob/master/spec.md#bondatomlist) where they are optional (you can either have neither of them, only bondAtomList or both).

The discussion on this was started in rcsb/mmtf-cpp#9 and rcsb/mmtf-cpp#10.

arose commented 6 years ago

The key should be there, but the list maybe empty, e.g. for single atom groups like Cl-. Generally, bonds are expected to be part of an mmtf file.

ping @pwrose

gtauriello commented 6 years ago

Ok. In that case the decoder/encoder in rcsb/mmtf-cpp behaves correctly as we will always write the list even if empty and always expect one to be there when reading. Note though that the global lists could be missing completely though (as in this test case in this repo: https://github.com/rcsb/mmtf/blob/master/test-suite/mmtf/empty-all0.mmtf).

For now, we will also assume that bondOrderList (as described in https://github.com/rcsb/mmtf/blob/master/spec.md#bondorderlist) can be an empty list even if bondAtomList is non-empty. Please let us know if this is a bad assumption...

danpf commented 6 years ago

This is sort of a continuation from https://github.com/rcsb/mmtf-cpp/pull/10

@pwrose attached is a script + an example of some things with the bond orders messed around with in various ways that I consider to be acceptable according to the spec. run it with python3 remove_bond_stuff.py 1PEF.mmtf

I like having bond and bond order information, but the spec says

bondOrderList

Optional field If it exists bondAtomList must also be present. However bondAtomList may exist without bondOrderList.

which implies that you can have a bond but not a bond Order. I think the way the spec is written is somewhat ambiguous regarding all bond information.

I had assumed including bond information in general was optional, but from the spec you could maybe also assume that including bond information is required and that the bondAtomList field was optional only in the sense that we might have a structure with no bonds (say a cloud of Cl- atoms. bond_order_stuff.zip

gtauriello commented 6 years ago

Just to clarify the behavior of the decoder/encoder in rcsb/mmtf-cpp for the cases in the zip-package: it can deal with empty lists (i.e. the empty_*.mmtf files), but not with missing entries in the groups (i.e. the del_*.mmtf files). The latter can easily be changed btw by treating those missing fields like empty lists, but so far we tried to be somewhat strict to disallow badly formatted mmtf files according to specs. And the decoder won't try to automagically add bonds. That's for the decoder-user to figure out...

gtauriello commented 6 years ago

The PR #35 resolved this issue so this can be closed.