differences between papers and implementations

snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations

MIT License

189 stars 38 forks source link

Hi, thanks for sharing this greatful study!

I'm trying to implement your code based on norman data now, and I have 2 questions.

The performance is better than the paper. In the paper, "pearson correlation" is between 0.5 and 0.6, but when I actually implemented it, "test_de_pearson" was about 0.84. And the performance you presented in "demo/model_tutorial.ipynb" also looks about 0.83 based on seen0, seen1. May I know why there is a difference?
In the paper, it says that normal data is a description of K562 cells, but the data downloaded from the link in the code is A549. Can you tell me why there is a difference?

Please let me know if my question is weird or if you have any additional comments. Thank you very much in advance.

Thanks for your question

pearson_de is not the metric we report in the paper. We report pearson_delta which looks at the Pearson correlation between the predicted change in expression and the true change in expression.

On the other hand, pearson_de looks at the Pearson correlation between the predicted absolute post-perturbation gene expression and the true absolute post-perturbation expression value, only limited to the most differentially expressed genes. More information on metrics in the paper can be found in Supplementary Table 1.

GEARS currently doesn't make any use of cell_type information so this column is set arbitrarily and does not impact model training in any way.

snap-stanford / GEARS

differences between papers and implementations #61