different result about dataset query

topazape / LSTM_Chem

Implementation of the paper - Generative Recurrent Networks for De Novo Drug Design.

The Unlicense

116 stars 55 forks source link

I tried to reproduce with the same dataset (chemble22) that the author said was used in the paper by referring to the code created by the you, but the results are different.

I tried below.

SELECT DISTINCT canonical_smiles FROM compound_structures WHERE molregno IN ( SELECT DISTINCT molregno FROM activities WHERE standard_type IN ("Kd", "Ki", "Kb", "IC50", "EC50") AND standard_units = "nM" ); result is [Result: 802320 rows]

Author said "dataset of 677,044 SMILES strings with annotated nanomolar activities(Kd/i/B, IC/EC50) from ChEMBL22 "

So I use Chembl22, and insert [standard_units = "nM"] for "nanomolar" , and [standard_type IN ("Kd", "Ki", "Kb", "IC50", "EC50")] for "activities(Kd/i/B, IC/EC50)"

what I missed?

topazape / LSTM_Chem

different result about dataset query #10