rcsb / mmtf-cpp

The pure C++ implementation of the MMTF API, decoder and encoder.
MIT License
21 stars 24 forks source link

Danpf/fix bond order consistency #10

Closed danpf closed 6 years ago

danpf commented 6 years ago

We shouldn't require that you have to have both bondOrder and bondAtomList. At least according to the spec.

This should fix that and I added some tests to make sure it doesn't happen again

Also added a little better reporting in a few places where is consistent fails

gtauriello commented 6 years ago

Thanks for catching this. There is one thing though that I am unsure about. I had assumed that all fields of groupType objects (https://github.com/rcsb/mmtf/blob/master/spec.md#grouplist) were required. As a result, the encoder will always write those fields and the decoder throws an error if any of them is missing. In the example files, groups with missing bonds just had empty lists there and that works fine. I guess that's ok, but I opened an issue in the mmtf repo to clarify...

Again: this is only about whether or not it is ok for fields within a group to be missing from the MMTF file. I think that your proposed changes for consistency checks are ok.

btw: do you know of any example MMTF file, where a group has a non-empty bondAtomList and an empty bondOrderList? Just wondering if this was spotted somewhere...

danpf commented 6 years ago

Thanks for your updates @gtauriello

I had assumed that all fields of groupType objects ... were required

This is definitely something that's not really clear from the documentation. I sort of assumed that the optional stuff carried over from the StructureData class but I probably shouldn't have.

btw: do you know of any example MMTF file, where a group has a non-empty bondAtomList and an empty bondOrderList? Just wondering if this was spotted somewhere...

No this was an application specific problem I ran into. Haven't had any problems with real mmtf files, but reworking old scripts to fit the spec

I once saw that schrodinger has their own mmtf implementation and they have scripts from pdb->mmtf which obviously ignore bonds. I don't know if that counts as evidence against the necessity of the group data though.

gtauriello commented 6 years ago

Ok. For now I would say that we keep it like this and just allow for those groupType sub fields to contain empty lists as you did. Dependent on what comes out in mmtf repo issue, we can adapt the encoder/decoder to even drop the empty lists and allow for them to be missing.

pwrose commented 6 years ago

From a practical point of view, I'm wondering how the currently implemented decoders behave if bondOrder and bondAtomList are missing. Dan, if you have an example file, can you send it to me. I'd like to test the Java and Python decoders.

The reason we've added bonds is so that software does not have to infer bonds, which is an error-prone and slow process. The software we are building of top of MMTF generally requires these data to be present.

On Fri, Jun 29, 2018 at 8:18 AM, Gerardo Tauriello <notifications@github.com

wrote:

Ok. For now I would say that we keep it like this and just allow for those groupType sub fields to contain empty lists as you did. Dependent on what comes out in mmtf repo issue, we can adapt the encoder/decoder to even drop the empty lists and allow for them to be missing.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-cpp/pull/10#issuecomment-401386305, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEDI7qH8OhZjf5N89OrZ6BY34iRDQks5uBkVHgaJpZM4U8Xrx .

gtauriello commented 6 years ago

Since this is more general than the C++ implementation, it probably makes sense to continue the discussion in the mmtf repo: rcsb/mmtf#33