nolanlab / citrus

Citrus Development Code
GNU General Public License v3.0
31 stars 20 forks source link

No "Feature False Discovery Rate" on model error rate graph #95

Closed ddelalca closed 7 years ago

ddelalca commented 8 years ago

Hello,

When I run citrus, I don't see a "Feature False Discovery Rate" line on the ModelErrorRate.pdf for both pamr and glmnet. I noticed that in all the examples for the ModelErrorRate.pdf plots that this line is featured, should I be seeing it?

rbruggner commented 8 years ago

There will not be a FDR rate for the lasso/glmnet model. However, you should be seeing one for the nearest shrunken centroid/PAMR model. Any error messages during runtime?

On Apr 4, 2016, at 11:46 AM, ddelalca notifications@github.com wrote:

Hello,

When I run citrus, I don't see a "Feature False Discovery Rate" line on the ModelErrorRate.pdf for both pamr and glmnet. I noticed that in all the examples for the ModelErrorRate.pdf plots that this line is featured, should I be seeing it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/nolanlab/citrus/issues/95

ddelalca commented 8 years ago

Here is what I see

plot(results,outputDirectory)

Plotting Results for defaultCondition
Plotting results for pamr 
Plotting Error Rate
Plotting Stratifying Features
Plotting Stratifying Clusters
Plotting Clustering HierarchyPlotting results for glmnet 
Plotting Error Rate
Plotting Stratifying Features
Plotting Stratifying Clusters
Plotting Clustering HierarchyPlotting results for sam 
Plotting Error Rate
Plotting Stratifying Features
Plotting Stratifying Clusters
Plotting Clustering Hierarchy

> # ==================================================================================================
> # The following lines perform the same analys .... [TRUNCATED] 
There were 14 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha,  ... :
  one multinomial or binomial class has fewer than 8  observations; dangerous ground
2: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha,  ... :
  one multinomial or binomial class has fewer than 8  observations; dangerous ground
3: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha,  ... :
  one multinomial or binomial class has fewer than 8  observations; dangerous ground
4: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
5: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
6: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
7: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
8: In loop_apply(n, do.ply) :
  Removed 3 rows containing non-finite values (stat_boxplot).
9: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
10: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
11: In loop_apply(n, do.ply) :
  Removed 3 rows containing non-finite values (stat_boxplot).
12: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
13: In loop_apply(n, do.ply) :
  Removed 1 rows containing non-finite values (stat_boxplot).
14: In loop_apply(n, do.ply) :
  Removed 3 rows containing non-finite values (stat_boxplot)

I just noticed that if I run it at nFold=1, then I see the blue line. When I change the nFold to anything else, then I don't see it. Another question, is there a certain amount of cross-validation folds that you use as a standard?

ddelalca commented 8 years ago

Also another question about how the Cross-Validation works. How does nFold validation work with a correlative model like SAM.

For example, for nFold=4, is the data broken into 4ths, run through SAM, and then all of the common features identified from each of the 4 runs are represented in the feature plot?

ddelalca commented 8 years ago

Sorry, one more question. I tried to do 10Fold cross-validation and it worked for one set of files, but didn't work for an other set of files. When it didn't work I got this error

12Clustering 12000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Error in folds[[foldIndex]] : subscript out of bounds

For the set of files that worked: I took 770 cells from 18 total samples (13860 cells total). There were 10 samples in group A and 8 samples in group B.

For the set of files that didn't work: I took 1000 cells from 13 total samples (13000 cells total). There were 7 samples in group A and 6 samples in group B.

When I did a smaller nFold with the files that didn't work, the error rate was super high. Could it be that the error rate is too high for it to do 10fold?

rbruggner commented 7 years ago

Sorry for the slow reply here. I believe this issue has been fixed in https://github.com/nolanlab/citrus/commit/c5d1844592c414a17296680ed620c15752b80809.