rcsb / mmtf

The specification of the MMTF format for biological structures
http://mmtf.rcsb.org/
44 stars 17 forks source link

Outdated atom names for some groups #47

Closed padix-key closed 3 years ago

padix-key commented 3 years ago

I found a case where the atom names in the atomNameList of the groupList is not consistent with the atom naming in the respective mmCIF file.

PDB ID: 1igy URL: https://mmtf.rcsb.org/v1.0/full/1igy group: NDG atom name: O (mmtf) or O5 (cif)

Excerpt from 1igy.cif:

HETATM 12351 C C1   . NDG E 3 .   ? 4.403   26.019  46.445  1.00 100.00 ? 2   NDG E C1   1 
HETATM 12352 C C2   . NDG E 3 .   ? 5.903   25.741  46.546  1.00 100.00 ? 2   NDG E C2   1 
HETATM 12353 C C3   . NDG E 3 .   ? 6.391   24.795  45.433  1.00 100.00 ? 2   NDG E C3   1 
HETATM 12354 C C4   . NDG E 3 .   ? 5.491   23.550  45.387  1.00 100.00 ? 2   NDG E C4   1 
HETATM 12355 C C5   . NDG E 3 .   ? 4.033   23.972  45.150  1.00 100.00 ? 2   NDG E C5   1 
HETATM 12356 C C6   . NDG E 3 .   ? 3.079   22.786  45.157  1.00 100.00 ? 2   NDG E C6   1 
HETATM 12357 C C7   . NDG E 3 .   ? 7.832   27.086  47.123  1.00 100.00 ? 2   NDG E C7   1 
HETATM 12358 C C8   . NDG E 3 .   ? 8.500   28.453  47.046  1.00 100.00 ? 2   NDG E C8   1 
HETATM 12359 O O5   . NDG E 3 .   ? 3.561   24.873  46.189  1.00 100.00 ? 2   NDG E O5   1 
HETATM 12360 O O3   . NDG E 3 .   ? 7.728   24.409  45.716  1.00 100.00 ? 2   NDG E O3   1 
HETATM 12361 O O4   . NDG E 3 .   ? 5.894   22.643  44.323  1.00 100.00 ? 2   NDG E O4   1 
HETATM 12362 O O6   . NDG E 3 .   ? 3.425   21.830  44.166  1.00 100.00 ? 2   NDG E O6   1 
HETATM 12363 O O7   . NDG E 3 .   ? 8.380   26.132  47.689  1.00 100.00 ? 2   NDG E O7   1 
HETATM 12364 N N2   . NDG E 3 .   ? 6.644   26.992  46.530  1.00 100.00 ? 2   NDG E N2   1 
HETATM 12365 H H1   . NDG E 3 .   ? 4.022   26.121  47.470  1.00 15.00  ? 2   NDG E H1   1 
HETATM 12366 H H2   . NDG E 3 .   ? 6.104   25.207  47.485  1.00 15.00  ? 2   NDG E H2   1 
HETATM 12367 H H3   . NDG E 3 .   ? 6.346   25.319  44.468  1.00 15.00  ? 2   NDG E H3   1 
HETATM 12368 H H4   . NDG E 3 .   ? 5.562   23.033  46.354  1.00 15.00  ? 2   NDG E H4   1 
HETATM 12369 H H5   . NDG E 3 .   ? 3.962   24.497  44.187  1.00 15.00  ? 2   NDG E H5   1 
HETATM 12370 H H61  . NDG E 3 .   ? 2.062   23.157  44.966  1.00 15.00  ? 2   NDG E H61  1 
HETATM 12371 H H62  . NDG E 3 .   ? 3.119   22.313  46.148  1.00 15.00  ? 2   NDG E H62  1 
HETATM 12372 H H81  . NDG E 3 .   ? 7.730   29.238  47.041  1.00 15.00  ? 2   NDG E H81  1 
HETATM 12373 H H82  . NDG E 3 .   ? 9.087   28.531  46.120  1.00 15.00  ? 2   NDG E H82  1 
HETATM 12374 H H83  . NDG E 3 .   ? 9.148   28.602  47.921  1.00 15.00  ? 2   NDG E H83  1 
HETATM 12375 H HO3  . NDG E 3 .   ? 7.755   23.952  46.560  1.00 15.00  ? 2   NDG E HO3  1 
HETATM 12376 H HO6  . NDG E 3 .   ? 4.377   21.829  44.043  1.00 15.00  ? 2   NDG E HO6  1 
HETATM 12377 H HN2  . NDG E 3 .   ? 6.265   27.774  46.078  1.00 15.00  ? 2   NDG E HN2  1

Excerpt from NDGentry in 1igy.mmtf groupList:

'groupName': 'NDG'
'atomNameList': ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'O', 'O3', 'O4', 'O6', 'O7', 'N2', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6C1', 'H6C2', 'H8C1', 'H8C2', 'H8C3', 'HB', 'H6', 'HA']
'elementList': ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'O', 'O', 'O', 'O', 'O', 'N', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H']
'bondOrderList': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
'bondAtomList': [1, 0, 2, 1, 3, 2, 4, 3, 5, 4, 7, 6, 8, 0, 8, 4, 9, 2, 10, 3, 11, 5, 12, 6, 13, 1, 13, 6, 14, 0, 15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 5, 21, 7, 22, 7, 23, 7, 24, 9, 25, 11, 26, 13]
'formalChargeList': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
'singleLetterCode': '?'
'chemCompType': 'D-SACCHARIDE'
speleo3 commented 3 years ago

This might affect all entries from the Carbohydrate Remediation: https://www.wwpdb.org/documentation/carbohydrate-remediation

josemduarte commented 3 years ago

Indeed, this is due to the carbohydrate remediation. The MMTF files have not been updated since the remediation yet.

We will try to do it over the next weeks. However, please notice that MMTF will not support any carbohydrate features and will treat carbohydrate entities as non-polymeric.

Please consider looking at the Binary CIF format which is up to date with carbohydrate remediation and still offers similar compression as MMTF.

josemduarte commented 3 years ago

@padix-key new mmtf files are available now. The problem should be gone. Let us know if you see any issues.

padix-key commented 3 years ago

For 1igy it works as expected now. Thank you.