mixed sparse-dense operation

jayypaul commented 4 years ago

Error message when doing: plot_vs_ttest()

It seems my DE didn't work because these commands yield the following results:

print(testReal.pval[:10]) print(testReal.qval[:10]) print(testReal.summary().iloc[:10,:])

[ 0. 0. 0. 0. 0. nan 0. nan 0. nan] [ 0. 0. 0. 0. 0. nan 0. nan 0. nan] gene pval qval log2fc mean zero_mean grad \ 0 0610009B22Rik 0.0 0.0 -2.548860 0.077986 False 2.152200e-09
1 0610009E02Rik 0.0 0.0 -5.538207 0.003878 False 1.559083e-09
2 0610009L18Rik 0.0 0.0 -3.834557 0.023685 False 1.799609e-09
3 0610010F05Rik 0.0 0.0 -2.349601 0.099437 False 1.004665e-09
4 0610010K14Rik 0.0 0.0 -4.677474 0.009545 False 3.948096e-09
5 0610012D04Rik NaN NaN -297.776029 0.001076 False 2.132940e-03
6 0610012G03Rik 0.0 0.0 -1.521848 0.211496 False 3.416914e-09
7 0610025J13Rik NaN NaN -297.776029 0.000821 False 1.240152e-03
8 0610030E20Rik 0.0 0.0 -2.573128 0.089237 False 2.278429e-09
9 0610033M10Rik NaN NaN -297.776029 0.001416 False 1.198979e-03

ll

0 -5247.489606
1 -433.791309
2 -1980.965365
3 -6226.564384
4 -946.499919
5 -1321.282389
6 -11006.115975
7 -1355.008785
8 -5764.351236
9 -1822.326354

Not to mention the volcano plot looks atrocious. Not sure what the problem could be. Very early on I had to turn the dense matrices of my separate batches into sparse matrices so I can concatenate using union. Any suggestions?

jayypaul commented 4 years ago

davidsebfischer commented 4 years ago

Hi @jayypaul, this is hard to diagnose because it may be caused by very different issues. I would check the model:

Is the model (wald test formula) appropriate? You might be running into issues during testing if you overfit or underfit the data strongly. Common culprits include modelling of size factors and confounding.

jayypaul commented 4 years ago

Hey David,

Thanks for your response. This is what I used:

testReal = de.test.wald( data = a, formula_loc = "~ 1 + Time + Condition", # Including batch won't work. factor_loc_totest = ['Intercept', 'Condition[T.Naive]', 'Condition[T.PCL]', 'Condition[T.Saline]', 'Time[T.Week_6]'] )

I have 2 timepoints, 4 different treatments. Each treatment per timepoint so 8 batches. Not really sure how to use this formula honestly.

davidsebfischer commented 4 years ago

Hi @jayypaul, you might want to consult with a local statistician on this, as there are a lot of points to be cleared up.

Formula: You might want to include the interaction between time and condition.
Testing: You are testing all effects, that means any gene that is non-zero anywhere will be significant. Usually, you would want to only test one effect, eg. whether conditio has an effect. Almost never is the intercept included in the test!

jayypaul commented 4 years ago

Hey David,

Thanks for the insight. That makes sense. Another question if you don't mind, to clarify. So since I have 4 conditions (uninjured = naive, saline, 1 biomaterial ECM, another biomaterial PCL), and I wanted to look at the DE genes in ECM compared to the rest, how do I specify that coefficient since that seems to be the reference group to all others (as in above)? Likewise, for timepoint week 1 as the reference group to week 6?

jayypaul commented 4 years ago

Also, If I wanted to test the specific effects of PCL at timepoint 6 (DE genes compared to rest of data), would I test: "Time[T.Week_6]:Condition[T.PCL]" when I specify the formula: ~ 1 + Time + Condition + Time * Condition

theislab / diffxpy

mixed sparse-dense operation #157