Open chrsunwil opened 6 years ago
now seems like a good time to start performing some evaluations. One (potentially) semi-quick eval that will help us decide next steps comes to mind.
x
samples in your testing set that are wild-type (y = 0) for TP53
(entrez ID: 7157
). (for our purposes, lets say x = 100
, but this can be modified)7157
equal to 1 in these samples)7157
(TP53
) wildtype vs. control.I think that this procedure will demonstrate how well the shared latent space is capturing shared biology between the two domains. I think it would be useful to code the scripts in such a way that the same procedure can be run with genes other than 7157
.
another note - lets do this procedure with x
training samples as well. May also be good to induce TP53 wild-type status (go from 7157
= 1 to 7157
= 0) and repeat the procedure.
I think coding the analysis to behave on any input gene in either direction will be important.
7. Run a global differential expression analysis (same as the above script)
in all samples with 7157 (TP53) wildtype vs. control.
Is control just all of the examples?
8. Create a scatterplot where the points are genes and the x axis is true
observed differential expression and the y axis is induced differential
expression - and output the Pearson correlation.
So the y axis is the result from step 6 and the x axis is the result from step 7?
Is control just all of the examples?
Yeah, lets do that. This may help to visualize.
So the y axis is the result from step 6 and the x axis is the result from step 7?
Yes
Do you have any suggestion for a python equivalent to lmFit
in
fit <- lmFit(t(rnaseq_df[, 2:ncol(rnaseq_df)]), ras_design)
I was looking at https://lmfit.github.io/lmfit-py/model.html
I suppose I'm still a little confused about:
Identify the deferentially expressed genes between the two RNAseq values
How do I actually calculate the deferentially expressed genes? I was having some trouble following your linked script.
I talked with Yoson, Nandita, and Casey, and I think I now know what I should do.
How to Evaluate when the model is learning the common biology between the two domains?