4. Observe that the code freezes with the output of `DEGanalysis` being:
The dataset is ready for differential expression analysis[1] "Cl2" "Cl1" "Cl3"
Number of comparisons: 6
Estimating sequencing depths...
Resampling to get new data matrices...
perm= 1
perm= 2
perm= 3
perm= 4
perm= 5
perm= 6
perm= 7
perm= 8
perm= 9
perm= 10
Number of thresholds chosen (all possible thresholds) = 1283
Getting all the cutoffs for the thresholds...
Getting number of false positives in the permutation...
'select()' returned 1:many mapping between keys and columns
Low-regulated genes in the Cl1 in Cl2 VS Cl1
'select()' returned 1:many mapping between keys and columns
Up-regulated genes in the Cl1 in Cl2 VS Cl1
Estimating sequencing depths...
Resampling to get new data matrices...
<kbd>Ctrl</kbd>+<kbd>C</kbd> doesn't quit the function, only killing R does the trick.
# Expected behavior
For a simpler set of parameters, for example `n_genes <- 100` and `K <- 2`, the code above ends with the following output:
Up-regulated genes in the Cl1 in Cl2 VS Cl1
Comparisons Target cluster Gene number File name Gene number File name
1 Cl2 VS Cl1 Cl1 477 Up-regulated-all_clustersCl1inCl2VSCl1.csv 941 Low-regulated-all_clustersCl1inCl2VSCl1.csv
2 Cl2 VS Cl1 Cl2 477 Low-regulated-all_clustersCl2inCl2VSCl1.csv 941 Up-regulated-all_clustersCl2inCl2VSCl1.csv
Moreover, the structure of `cdiff` is:
List of 2
$ : chr [1:1422, 1:2] "ENSG00000005022" "ENSG00000006327" "ENSG00000008394" "ENSG00000008517" ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "DEGsE" "DEGsS"
$ :'data.frame': 2 obs. of 6 variables:
..$ Comparisons : chr [1:2] "Cl2 VS Cl1" "Cl2 VS Cl1"
..$ Target cluster: chr [1:2] "Cl1" "Cl2"
..$ Gene number : int [1:2] 477 477
..$ File name : chr [1:2] "Up-regulated-all_clustersCl1inCl2VSCl1.csv" "Low-regulated-all_clustersCl2inCl2VSCl1.csv"
..$ Gene number : int [1:2] 941 941
..$ File name : chr [1:2] "Low-regulated-all_clustersCl1inCl2VSCl1.csv" "Up-regulated-all_clustersCl2inCl2VSCl1.csv"
# Software metainformation
- Operating System: Ubuntu 18.04
- R version: 4.0.0
- DIscBIO version or commit number: 0.99.6
Fixed on caa84d9e48e6499ad41e51601aca9a2783b65ba8 (dev branch). Bug was caused by a Fortran-compiled function in the SAMR package. It was replaced by an equivalent (and much faster) R function. Closing.
Describe the bug
There are some circumstances that cause the
DEGanalysis
function to hang.To Reproduce
==============================================================================
Determining contants
==============================================================================
n_genes <- 500 K <- 3
==============================================================================
Subsetting and formatting datasets
==============================================================================
sc_dataframe <- pan_indrop_matrix_8000cells_18556genes[, seq_len(n_genes)] sc <- DISCBIO(sc_dataframe)
==============================================================================
Performing operations
==============================================================================
MIinExp <- mean(rowMeans(sc_dataframe, na.rm=TRUE)) MinNumber <- round(length(sc_dataframe[1, ]) / 10) sc <- Normalizedata( sc, mintotal=1000, minexpr=MIinExp, minnumber=MinNumber, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000 ) sc <- FinalPreprocessing(sc, GeneFlitering="ExpF", export=FALSE, quiet=TRUE) sc <- Clustexp(sc, cln=K, quiet=TRUE) sc <- comptSNE(sc, rseed=15555, quiet=TRUE)
This is the part that freezes
cdiff <- DEGanalysis( sc, Clustering="K-means", K=K, fdr=0.10, name="all_clusters", export=FALSE, quiet=FALSE, plot=FALSE, nresamp=5, nperms=10 )
The dataset is ready for differential expression analysis[1] "Cl2" "Cl1" "Cl3" Number of comparisons: 6 Estimating sequencing depths... Resampling to get new data matrices... perm= 1 perm= 2 perm= 3 perm= 4 perm= 5 perm= 6 perm= 7 perm= 8 perm= 9 perm= 10 Number of thresholds chosen (all possible thresholds) = 1283 Getting all the cutoffs for the thresholds... Getting number of false positives in the permutation... 'select()' returned 1:many mapping between keys and columns Low-regulated genes in the Cl1 in Cl2 VS Cl1 'select()' returned 1:many mapping between keys and columns Up-regulated genes in the Cl1 in Cl2 VS Cl1 Estimating sequencing depths... Resampling to get new data matrices...
Up-regulated genes in the Cl1 in Cl2 VS Cl1 Comparisons Target cluster Gene number File name Gene number File name 1 Cl2 VS Cl1 Cl1 477 Up-regulated-all_clustersCl1inCl2VSCl1.csv 941 Low-regulated-all_clustersCl1inCl2VSCl1.csv 2 Cl2 VS Cl1 Cl2 477 Low-regulated-all_clustersCl2inCl2VSCl1.csv 941 Up-regulated-all_clustersCl2inCl2VSCl1.csv
List of 2 $ : chr [1:1422, 1:2] "ENSG00000005022" "ENSG00000006327" "ENSG00000008394" "ENSG00000008517" ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:2] "DEGsE" "DEGsS" $ :'data.frame': 2 obs. of 6 variables: ..$ Comparisons : chr [1:2] "Cl2 VS Cl1" "Cl2 VS Cl1" ..$ Target cluster: chr [1:2] "Cl1" "Cl2" ..$ Gene number : int [1:2] 477 477 ..$ File name : chr [1:2] "Up-regulated-all_clustersCl1inCl2VSCl1.csv" "Low-regulated-all_clustersCl2inCl2VSCl1.csv" ..$ Gene number : int [1:2] 941 941 ..$ File name : chr [1:2] "Low-regulated-all_clustersCl1inCl2VSCl1.csv" "Up-regulated-all_clustersCl2inCl2VSCl1.csv"