rcsb / mmtf

The specification of the MMTF format for biological structures
http://mmtf.rcsb.org/
44 stars 17 forks source link

Bond order and aromatics/resonance #34

Closed danpf closed 5 years ago

danpf commented 6 years ago

Was there ever a discussion about how we(applications) should be setting the bond order for aromatics or resonance bonds. Did the idea of a 5th bond type (aromatic/resonance) ever come up?

I ask because Rosetta stores bond information as single, double, triple, or Aromatic/Resonance, which makes sense (at least to me). I would assume all the aromatic bonds in phenylalanine would be considered equal, but currently I have decide bonds as 1 vs 2.

speleo3 commented 6 years ago

Very good question, made me realize that we're doing it wrong in PyMOL. PyMOL uses order 4 for aromatic and currently writes that to MMTF, though according to the spec that would be a quadruple bond.

pwrose commented 6 years ago

We restricted bond order to be single, ..., quadruple bonds, since aromatic or resonant bond types are not uniquely defined (e.g., some heterocycles), and not supported or interpreted differently by various chemoinformatics packages. Similarly, we currently do not support any dative bond types due to a lack of consensus on how to represent these types of bonds.

I'm not sure if there's a need to support quadruple bonds. It is not currently used, so in principle, we could redefine type 4 as aromatic/resonant bond type.

Pros

Cons

I'd be open to change the definition of bond type 4 in a new version of MMTF. What do others think?

On Wed, Jul 4, 2018 at 5:11 PM, Thomas Holder notifications@github.com wrote:

Very good question, made me realize that we're doing it wrong in PyMOL. PyMOL uses order 4 for aromatic and currently writes that to MMTF, though according to the spec that would be a quadruple bond.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf/issues/34#issuecomment-402574687, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEPkwqGgCKz_SoJzQm8_eSjlom5-eks5uDVnOgaJpZM4VDDrF .

-- Peter Rose, Ph.D. Director, Structural Bioinformatics Laboratory San Diego Supercomputer Center UC San Diego +1-858-822-5497

danpf commented 6 years ago

just my 2c:

mmtf_code definition
0 aromatic
1 single bond
2 double bond
3 triple bond
etc etc

This way we would never be stumped by crazy metal bonds, and most of everything else wouldn't have to change.

gtauriello commented 6 years ago

I'm not a chemist, so I leave the reasoning to you guys, but from a practical perspective the 0 option proposed by @danpf makes most sense to me (unless "4" is generally accepted as an aromatic bond for many software packages). For the ambiguity of bond definitions, I suppose this is unavoidable even if we don't introduce a special bond order (at least as far as I understood the inherent ambiguity of how to alternate bond orders in the mentioned cases).

speleo3 commented 6 years ago

my 2c:

Not having an aromatic bond in the spec means that an application which has them will either

  1. be forced to kekulize a molecule prior to MMTF export (e.g. PyMOL currently isn't able to do that)
  2. loose information by exporting aromatic bonds as something else (e.g. single bonds)
  3. export a non-standard MMTF file
  4. not support MMTF export

I think options 2.-4. should really be avoided. Option 1. is preferred, but might not be available.

I've never heard of quadruple bonds before I read it in the MMTF spec.

Both MOL2 and SDF have an aromatic bond type.

The Schrodinger modeling suite also has zero-order bonds for metal coordination. Supporting them with MMTF would be nice, but not having them is easier to handle (just skip them for MMTF export) than not having aromatic bonds.

abradle commented 6 years ago

Can see Thomas point.

But I agree with Peter that chosing an aromaticity perception methods causes problems.

If we use Aromaticity Perception Method A (e.g. RDKit) specifying aromatic bonds means that software that uses Aromaticity Perception Method B may not be able to parse some molecules.

Perhaps safer would be to add an "Aromatic Flag List". That way we can transparently support multiple aromaticity perception methods. (List of booleans same length as bond order list) - coupled with a description of method used.

Best wishes,

Anthony

On Thu, Jul 5, 2018 at 10:13 AM Thomas Holder notifications@github.com wrote:

my 2c:

Not having an aromatic bond in the spec means that an application which has them will either

  1. be forced to kekulize a molecule prior to MMTF export (e.g. PyMOL currently isn't able to do that)
  2. loose information by exporting aromatic bonds as something else (e.g. single bonds)
  3. export a non-standard MMTF file
  4. not support MMTF export

I think options 2.-4. should really be avoided. Option 1. is preferred, but might not be available.

I've never heard of quadruple bonds before I read it in the MMTF spec.

Both MOL2 and SDF have an aromatic bond type.

The Schrodinger modeling suite also has zero-order bonds for metal coordination. Supporting them with MMTF would be nice, but not having them is easier to handle (just skip them for MMTF export) than not having aromatic bonds.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf/issues/34#issuecomment-402658611, or mute the thread https://github.com/notifications/unsubscribe-auth/AFuRIcIlgkTK8mlhLOZOHoi_tNdfH2R4ks5uDdjGgaJpZM4VDDrF .

arose commented 6 years ago

Perhaps safer would be to add an "Aromatic Flag List".

+1, that is also how it is handled in the mmcif dictionary, see http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_chem_comp_bond.pdbx_aromatic_flag.html

gtauriello commented 6 years ago

One thing to consider maybe. There are two concepts here:

  1. What the file format can do. Using mmCIF as example: there is the aromatic flag mentioned by @arose but actually the value_order field (http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_chem_comp_bond.value_order.html) is a controlled dictionary which also has an "aromatic bond" option (funnily enough that's also used in the PHE example in http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/chem_comp_bond.html)
  2. How the file format is used for PDB entries (i.e. for the MMTF files provided at mmtf.rcsb.org). Using the mmCIF example again: the policy in the Chemical Component dictionary seems to be to use the aromatic flag and only sing/doub/trip as possible value_order values. That's a great choice but mmCIF doesn't enforce this choice onto every other mmCIF file producer.

My point being: if we add an aromatic flag, it means that every MMTF file producer needs to write both the "kekulized" form and (!) the aromatic form (or at least I don't see how you could do just the aromatic form unless bond order is extended). Not sure if that makes things more user-friendly. So it might be better to support both as the mmCIF format does...

danpf commented 6 years ago

Maybe this is slightly off topic, but since Thomas mentioned 0 order bonds...

From my reading, most programs can read smiles strings that don't utilize the lowercase aromatic notation, so they should be able to handle kekulized representations.

So I think that having an aromatic flag would be ideal because it would ensure compatibility w/ other programs + direct 1:1 visualization from molecular modeling program to molecular viewer.

My only ideas were:

  1. if we stick to a binary 'aliphatic vs aromatic' flag, we should just store a map of aromatic indexes that map to bondOrder instead of storing a bunch of 0's that serve no purpose.

  2. if we wanted to stick to a every bond has a bondDescriptor, we could instead use a system like

    • 1=aliphatic bond
    • 2=aromatic bond,
    • 3=hydrogen/non covalent bond
    • which could also be expanded to 4=ionic or 5=dipole etc. I think this is a little out of the scope of the RCSB/PDB protein writer (maybe not, I don't know how difficult this would be for experimental data), but could actually be really useful for mining the outputs from modeling programs that report that kind of data.
    • Obviously using anything >2 would be optional, but would again enhance the 1:1 modeling -> viewer experience.

Perhaps this is a topic for another issue but: was there ever a discussion for noncovalent bonds?

the only other thing to add would be a key that mentions the type of aromatic notation that is used: valid aromaticity types I could find :

~Final note:~ ~from a old version of the mmtf DB. (not sure when I downloaded it)~ ~there are 1232847295 bonds.~ ~so adding a 8bit field for every bond would add:~ ~1232847295 bonds * 8 bits/bond / 8e6 bits/megabyte = 1233 megabytes added... which is quite a lot... (please double check that math is right?)~ Sorry this was totally wrong.

speleo3 commented 6 years ago

we should just store a map of aromatic indexes that map to bondOrder instead of storing a bunch of 0's that serve no purpose.

How about run-length encoding (e.g. strategy 7)?

@danpf your bondDescriptor sounds like a combination of _chem_comp_bond.value_order and _pdbx_struct_link.type.

Regarding aromatic flag list vs. aromatic bond type: In case we add the flag list but not the aromatic bond type, can we then at least add an unknownor undefined bond order? I'm still thinking of the use case were an application doesn't know the kekulized form and wants to export MMTF. It would then export a bond with aromaticFlag=True and bondOrder=unknown.

abradle commented 6 years ago

I’d agree with Thomas. I was thinking that could be bond order 0 . I think that calculation is a massive over estimate, since we have dictionary encoding for bond information. And yes I’d do run length encoding. So that In most cases it’s two extra ints (0 and num bonds) per group. As suggested I would directly support flavours (eg RDKit) with an unknown flavour too. 0 being not aromatic. On Fri, 6 Jul 2018 at 10:11, Thomas Holder notifications@github.com wrote:

we should just store a map of aromatic indexes that map to bondOrder instead of storing a bunch of 0's that serve no purpose.

How about run-length encoding (e.g. strategy 7)?

@danpf https://github.com/danpf your bondDescriptor sounds like a combination of _chem_comp_bond.value_order http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_chem_comp_bond.value_order.html and _pdbx_struct_link.type http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_struct_link.type.html .

Regarding aromatic flag list vs. aromatic bond type: In case we add the flag list but not the aromatic bond type, can we then at least add an unknownor undefined bond order? I'm still thinking of the use case were an application doesn't know the kekulized form and wants to export MMTF. It would then export a bond with aromaticFlag=True and bondOrder=unknown.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf/issues/34#issuecomment-402976742, or mute the thread https://github.com/notifications/unsubscribe-auth/AFuRIX_Y1ZQUG5Z-UZEdvuD_vBJIBMoeks5uDynMgaJpZM4VDDrF .

pwrose commented 6 years ago

The idea behind MMTF was that is somewhat like Java, it will run anywhere. For MMTF that means one algorithm is capable of processing the bond information, without resorting to specialized code to handle specific implementations.

Since some tools cannot provide a kekulized form, we need a workaround for those cases.

So how about the following proposal based on the discussion above:

This leads to 3 possible representations of an aromatic bond:

  1. bond order = 0 , aromatic flag = 1 (kekulized form is unavailable)
  2. bond order = 1 | 2 (alternating single/double bonds), aromatic flag = 1 (if available)
  3. bond order = 1 | 2 (alternating single/double bonds), aromatic flag = 0 (aromaticity is undefined)

Representation 2 is used in the wwPDB Chemical Component Dictionary (see for example toluene: http://files.rcsb.org/ligands/view/MBN.cif)

As Anthony suggested, by applying run-length encoding to the aromatic flag field, the overhead would be insignificant.

If we make such a change, we may break backward compatibility, and need to increment the version to 2.0. That would also be a good time to implement the other improvements we talked about.

danpf commented 5 years ago

Resolved by #35