yangkevin2 / coronavirus_data

81 stars 31 forks source link

Chemprop SARS models working? #6

Closed Sali-m closed 4 years ago

Sali-m commented 4 years ago

Hi! The SMILES listed under predictions are sorted from highest to lowest probability of activity. But, for example, the values returned by Chemprop for the first three (lines 2-4) in AID1706_model_broad_repurposing_library_preds.csv using the "SARS" model are: (1) 0.4181 (2) 0.4514 (3) 0.4549 and for lines 4500 and 5000 (ranked 4499 and 4999): (4499) 0.1093 (4999) 0.3064 Apparently no specific relationship exists between the ranking and the predicted activity values. This is also true about AID1706_balanced_model_broad_repurposing_library_preds.csv using the "SARS - balanced" model.

The predictions are also rather incorrect regarding the experimental data on which the model has been trained. For example Chemprop gives 0.4531 for this compound which has the highest score (100) and inhibition (80.36%) against the SARS 3CLPro as reported in AID1706 and gives 0.47603 for this one which has a score of zero and an inhibition of -6.48%.

And the top SARS-CoV-2 antiviral molecules in the Broad drug repurposing hub listed in arXiv:2005.03004. Chemprop returns an activity of 0.2707 for the molecule with the highest activity (0.955; first line of Table 4 in page 8). Am I missing a point?

yangkevin2 commented 4 years ago

Thanks for bringing this to our attention. It looks like we didn't enable one of the required model flags in the website inference code. It should be fixed now; please let us know if you encounter any further issues.