snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
MIT License
189 stars 38 forks source link

differences between papers and implementations #61

Closed sheunbaek closed 4 months ago

sheunbaek commented 5 months ago

Hi, thanks for sharing this greatful study!

I'm trying to implement your code based on norman data now, and I have 2 questions.

  1. The performance is better than the paper. In the paper, "pearson correlation" is between 0.5 and 0.6, but when I actually implemented it, "test_de_pearson" was about 0.84. And the performance you presented in "demo/model_tutorial.ipynb" also looks about 0.83 based on seen0, seen1. May I know why there is a difference? image

  2. In the paper, it says that normal data is a description of K562 cells, but the data downloaded from the link in the code is A549. Can you tell me why there is a difference? image

Please let me know if my question is weird or if you have any additional comments. Thank you very much in advance.

yhr91 commented 4 months ago

Thanks for your question

  1. pearson_de is not the metric we report in the paper. We report pearson_delta which looks at the Pearson correlation between the predicted change in expression and the true change in expression. Screenshot 2024-04-13 at 5 47 48 PM

On the other hand, pearson_de looks at the Pearson correlation between the predicted absolute post-perturbation gene expression and the true absolute post-perturbation expression value, only limited to the most differentially expressed genes. More information on metrics in the paper can be found in Supplementary Table 1.

  1. GEARS currently doesn't make any use of cell_type information so this column is set arbitrarily and does not impact model training in any way.