question about reproducing the result

huawen-poppy commented 1 year ago

Hello! Thanks for the exciting paper! Recently, I tried to reproduce the results of figure 2 and supplementary table 5. For figure2, I followed the tutorial (chemCPA_Figure_2_rdkit.ipynb) in the notebook directory. But I found the given model hash ids for rdkit models are outside the given finetuning_num_genes.json file. Anyway, I tried to use the rdkit model hash id '27b401db1845eea26c102fb614df9c33' (pretrained) and '51b81b77079c1060aedb0ee2259008ca' (non-pretrained), which are available from the json file to produce figure2. But I got different results from the paper. Could you please provide the models that figure2 used?

For the supplementary table5, I don't know how the mean r2 was calculated. Is the r2-score for the predictions under all kinds of dosage drugs? Or is it only specific to one type of dosage?

MxMstrmn commented 1 year ago

Hi @huawen-poppy,

Thanks for reaching out! I hope you like our work. Checking the notebook file chemCPA_Figure_2_rdkit.py I assume that the correct hashes are these:

# * RDKit:
#      * fine-tuned:      `'c824e42f7ce751cf9a8ed26f0d9e0af7'`
#      * non-pretrained: `'59bdaefb1c1adfaf2976e3fdf62afa21'`

You can access the models used for the paper figures here: https://drive.google.com/file/d/1y2j73xNioavBpWugmEdvyflpjNmpUD1j/view?usp=share_link

I hope that solves your issue.

jasperhyp commented 11 months ago

Hi @huawen-poppy , did you set up seml in order to reproduce the results? I am trying to run the model without seml but it proves to be a bit challenging. If you didn't either, I would greatly appreciate some pointers towards that! Thanks!

theislab / chemCPA

question about reproducing the result #114