Closed ddelalca closed 7 years ago
There will not be a FDR rate for the lasso/glmnet model. However, you should be seeing one for the nearest shrunken centroid/PAMR model. Any error messages during runtime?
On Apr 4, 2016, at 11:46 AM, ddelalca notifications@github.com wrote:
Hello,
When I run citrus, I don't see a "Feature False Discovery Rate" line on the ModelErrorRate.pdf for both pamr and glmnet. I noticed that in all the examples for the ModelErrorRate.pdf plots that this line is featured, should I be seeing it?
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/nolanlab/citrus/issues/95
Here is what I see
plot(results,outputDirectory)
Plotting Results for defaultCondition
Plotting results for pamr
Plotting Error Rate
Plotting Stratifying Features
Plotting Stratifying Clusters
Plotting Clustering HierarchyPlotting results for glmnet
Plotting Error Rate
Plotting Stratifying Features
Plotting Stratifying Clusters
Plotting Clustering HierarchyPlotting results for sam
Plotting Error Rate
Plotting Stratifying Features
Plotting Stratifying Clusters
Plotting Clustering Hierarchy
> # ==================================================================================================
> # The following lines perform the same analys .... [TRUNCATED]
There were 14 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, ... :
one multinomial or binomial class has fewer than 8 observations; dangerous ground
2: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, ... :
one multinomial or binomial class has fewer than 8 observations; dangerous ground
3: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, ... :
one multinomial or binomial class has fewer than 8 observations; dangerous ground
4: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
5: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
6: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
7: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
8: In loop_apply(n, do.ply) :
Removed 3 rows containing non-finite values (stat_boxplot).
9: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
10: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
11: In loop_apply(n, do.ply) :
Removed 3 rows containing non-finite values (stat_boxplot).
12: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
13: In loop_apply(n, do.ply) :
Removed 1 rows containing non-finite values (stat_boxplot).
14: In loop_apply(n, do.ply) :
Removed 3 rows containing non-finite values (stat_boxplot)
I just noticed that if I run it at nFold=1, then I see the blue line. When I change the nFold to anything else, then I don't see it. Another question, is there a certain amount of cross-validation folds that you use as a standard?
Also another question about how the Cross-Validation works. How does nFold validation work with a correlative model like SAM.
For example, for nFold=4, is the data broken into 4ths, run through SAM, and then all of the common features identified from each of the 4 runs are represented in the feature plot?
Sorry, one more question. I tried to do 10Fold cross-validation and it worked for one set of files, but didn't work for an other set of files. When it didn't work I got this error
12Clustering 12000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Clustering 11000 events
Error in folds[[foldIndex]] : subscript out of bounds
For the set of files that worked: I took 770 cells from 18 total samples (13860 cells total). There were 10 samples in group A and 8 samples in group B.
For the set of files that didn't work: I took 1000 cells from 13 total samples (13000 cells total). There were 7 samples in group A and 6 samples in group B.
When I did a smaller nFold with the files that didn't work, the error rate was super high. Could it be that the error rate is too high for it to do 10fold?
Sorry for the slow reply here. I believe this issue has been fixed in https://github.com/nolanlab/citrus/commit/c5d1844592c414a17296680ed620c15752b80809.
Hello,
When I run citrus, I don't see a "Feature False Discovery Rate" line on the ModelErrorRate.pdf for both pamr and glmnet. I noticed that in all the examples for the ModelErrorRate.pdf plots that this line is featured, should I be seeing it?