snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
MIT License
189 stars 38 forks source link

Question about how to generate condition colname? #63

Closed yanwuanxin closed 4 months ago

yanwuanxin commented 5 months ago

Hi Yusuf et al.

Assuming my adata has 3000 cells, and i want to perturba two genes(geneA & geneB, i know it's not well to train models with few genes). is the following code reasonable.

tem = []
tem.extend(list(np.repeat("geneA+ctrl", 1000)))
tem.extend(list(np.repeat("geneB+ctrl", 1000)))
tem.extend(list(np.repeat("ctrl", 1000)))
adata.obs = adata.obs.assign(condition = tem)

3000 cells, 1000 cells set as ctrl, and the remaining cells are evenly distributed to geneA and geneB, is it okay to do this?

yhr91 commented 4 months ago

Syntactically this is fine, but may not be a strong model

yanwuanxin commented 4 months ago

without considering other factors, can I use above code generate condition under this assumption?

yanwuanxin commented 4 months ago

your data:

b8430bbce7a18b1207c0412dc29caed

my data:

d0faac61939364172f8d0d184a8266e

in one of your adata.obs.condition, the distribution of number of perturbations and 'ctrl' seems to have no pattern at all; in my own adata.obs.condition, i use the following code generate condition, cell number of perturbations and 'ctrl' are almost equal;

image

my question:

  1. can the above code be used to generate condition?
  2. does the distribution of cell numbers of perturbations affect the results, if so, how to determine the distribution of cell numbers of perturbations?

looking forward to your reply.