snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
MIT License
211 stars 41 forks source link

Questions about reproduction of Figure2f and Figure2cd #58

Open yu3jun opened 8 months ago

yu3jun commented 8 months ago

Hi, Thank you for publishing such an excellent paper! I'm trying to reproduce the results of Figure2f and Figure2cd.

in https://github.com/yhr91/GEARS_misc/blob/main/paper/archive/fig2f.ipynb the pd.DataFrame(out) has 80 rows × 4 columns, all methods and category have different results about Top 20 DE MSE.

I wonder how they were computed, as I use the set as you advised in https://github.com/yhr91/GEARS_misc/blob/main/paper/reproduce_preprint_results.ipynb to could only get few close results of different combinations of methods and category

How could we get different results of same method and category(like Gears and 2/2 seen)? And if we use the mean No-perturb Top 20 DE MSE to compute others' Normalized MSE of Top 20 DE Genes? I would appreciate very much if you could share some of the parametes to help reproduct the results, thanks a lot!!!

image

ZyuanZhang commented 1 day ago

Hello, @yu3jun . As I reproduced Fig 2cd and 2h, some files were missing.

In fig2cd.ipynb:

df_all_datasets['replogle2022_rpe1'] = pd.read_csv('/dfs/user/kexinh/perturb_GNN/pertnet/replogle_rpe1_gw_filtered_hvg_frac.csv')
df_all_datasets['replogle2022_k562'] = pd.read_csv('/dfs/user/kexinh/perturb_GNN/pertnet/replogle_k562_essential_filtered_hvg_frac.csv')

In fig2h.ipynb:

p_vals = np.load('p_values_norman_filter_0.01_gears.npy',allow_pickle=True).item()
jaccards = np.load('jaccards_norman_filter_0.01_gears.npy',allow_pickle=True).item()

I couldn't find those files in this repository. As you reproduced Figure 2f and 2cd, how did you get these files? Thank you!