mwang87 / SMART_NMR

Repository to help develop SMART
https://smart.ucsd.edu/
MIT License
0 stars 0 forks source link

Fixed DB (07012020) #142

Closed ChungmaruQ closed 4 years ago

ChungmaruQ commented 4 years ago

Download link: https://www.dropbox.com/s/6sewljlmmhgc8pc/DB_07012020_SM2.1%28100K%29.json?dl=0

Database Format

The format of the database is a json file, that is a list of records. The following headers are included.

  1. Compound_name - Compound Name
  2. Embeddings - 180 dimension embedding
  3. SMILES - SMILES Structure
  4. MW - exact mass
  5. From - indicates the database
  6. ID - unique identifier to give the database a pseudo accession. These can be integers or simply uuids, but they must be unique per entry and must not be NULL.
  7. JEOL_link - external link for JEOL DB derived compounds to see NMR data. The JEOL DB derived compounds have jeol db link (https://www.j-resonance.com/en/nmrdb/data/xxx).

[{'Compound_name': 'micrococcin P1', 'Embeddings': [0.1537381113, 0.3115234971, -1.3087806702,................... -0.2351712883], 'SMILES': 'CC=C(NC(=O)c1csc(-c2csc(-c3ccc4c(n3)-c3csc(n3)C(C(C)O)NC(=O)c3csc(n3)C(C(C)C)NC(=O)c3csc(n3)C(=CC)NC(=O)C(C(C)O)NC(=O)c3csc-4n3)n2)n1)C(=O)NCC(C)O', 'MW': 1143.2, 'From': 'Jeol', 'ID': 'v2.1_0', 'JEOL_link': 'https://www.j-resonance.com/en/nmrdb/data/1'}, {'Compound_name': 'chelerythrine', 'Embeddings': [0.1537381113, 0.3115234971, -1.3087806702,................... -0.2351712883], 'SMILES': 'COc1ccc2c(cn+c3c4cc5c(cc4ccc23)OCO5)c1OC', 'MW': 348.1, 'From': 'Jeol', 'ID': 'v2.1_1', 'JEOL_link': 'https://www.j-resonance.com/en/nmrdb/data/2'},

. . . }]