The format of the database is a json file, that is a list of records. The following headers are included.
Compound_name - Compound Name
Embeddings - 180 dimension embedding
SMILES - SMILES Structure
MW - exact mass
From - indicates the database
ID - unique identifier to give the database a pseudo accession. These can be integers or simply uuids, but they must be unique per entry and must not be NULL.
Download link: https://www.dropbox.com/s/6sewljlmmhgc8pc/DB_07012020_SM2.1%28100K%29.json?dl=0
Database Format
The format of the database is a json file, that is a list of records. The following headers are included.
[{'Compound_name': 'micrococcin P1', 'Embeddings': [0.1537381113, 0.3115234971, -1.3087806702,................... -0.2351712883], 'SMILES': 'CC=C(NC(=O)c1csc(-c2csc(-c3ccc4c(n3)-c3csc(n3)C(C(C)O)NC(=O)c3csc(n3)C(C(C)C)NC(=O)c3csc(n3)C(=CC)NC(=O)C(C(C)O)NC(=O)c3csc-4n3)n2)n1)C(=O)NCC(C)O', 'MW': 1143.2, 'From': 'Jeol', 'ID': 'v2.1_0', 'JEOL_link': 'https://www.j-resonance.com/en/nmrdb/data/1'}, {'Compound_name': 'chelerythrine', 'Embeddings': [0.1537381113, 0.3115234971, -1.3087806702,................... -0.2351712883], 'SMILES': 'COc1ccc2c(cn+c3c4cc5c(cc4ccc23)OCO5)c1OC', 'MW': 348.1, 'From': 'Jeol', 'ID': 'v2.1_1', 'JEOL_link': 'https://www.j-resonance.com/en/nmrdb/data/2'},
. . . }]