A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics
New vignette hangs with reduced dataset

Under DIscBIO version, the new vignette hangs with the reduced dataset.

What works

The following code works fine, and is identical to the one currently present on the vignette:

opts_chunk$set(fig.width=7, fig.height=7)

DataSet <- valuesG1msReduced

sc <- DISCBIO(DataSet)    

sc<-NoiseFiltering(sc,percentile=0.9, CV=0.2)

####  Normalizing the reads without any further gene filtering
sc<-Normalizedata(sc, mintotal=1000, minexpr=0, minnumber=0, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000) 

####  Additional gene filtering step based on gene expression

sc<-FinalPreprocessing(sc,GeneFlitering="NoiseF",export = TRUE) ### The GeneFiltering can be either "NoiseF" or"ExpF"

if (OnlyExpressionFiltering==TRUE){
    MIínExp<- mean(rowMeans(DataSet,na.rm=TRUE))
    MinNumber<- round(length(DataSet[1,])/3)    # To be expressed in at least one third of the cells.
    sc<-Normalizedata(sc, mintotal=1000, minexpr=MIínExp, minnumber=MinNumber, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000) #### In this case this function is used to filter out genes and cells.
    sc<-FinalPreprocessing(sc,GeneFlitering="ExpF",export = TRUE)
    sc<-FinalPreprocessing(sc,GeneFlitering="ExpF",export = TRUE)  

sc<- Clustexp(sc,cln=3,quiet=TRUE)    #### K-means clustering to get three clusters
plotGap(sc)        ### Plotting gap statistics

sc<- comptSNE(sc,rseed=15555,quiet = TRUE)
cat("\t","     Cell-ID"," Cluster Number","\n")

# Silhouette of k-means clusters
plotSilhouette(sc,K=3)       # K is the number of clusters

Jaccard(sc,Clustering="K-means", K=3, plot = TRUE)     # Jaccard of k-means clusters

############ Plotting K-means clusters
plotKmeansLabelstSNE(sc) # To plot the the ID of the cells in eacj cluster
plotSymbolstSNE(sc,types=sub("(\\_\\d+)$","", names(sc@ndata))) # To plot the the ID of the cells in each cluster

outlg<-round(length(sc@fdata[,1])/200)     # The cell will be considered as an outlier if it has a minimum of 0.5% of the number of filtered genes as outlier genes. 
Outliers<- FindOutliersKM(sc, K=3, outminc=5,outlg=outlg,probthr=.5*1e-3,thr=2**-(1:40),outdistquant=.75, plot = TRUE, quiet = FALSE)

# RemovingOutliers=TRUE                    # Removing the defined outlier cells based on K-means Clustering

    cat("Outlier cells were removed, now you need to start from the beginning")

sc<-KmeanOrder(sc,quiet = FALSE, export = TRUE)

KMclustheatmap(sc,hmethod="single", plot = TRUE) 

g='ENSG00000000003'                   #### Plotting the expression of  MT-RNR2

####### differential expression analysis between cluster 1 and cluster 3 of the Model-Based clustering using FDR of 0.05
cdiff <- DEGanalysis2clust(
  sc, Clustering="K-means", K=3, fdr=0.1, name="Name", export=TRUE, quiet=TRUE

#### To show the result table
head(cdiff[[1]])                  # The first component 
head(cdiff[[2]])                  # The second component 

What doesn't work

The next line, however, hangs:

cdiff <- DEGanalysis(
  sc, Clustering="K-means", K=3, fdr=0.1, name="Name", export=TRUE,

The last output lines before the freeze are these:

Number of thresholds chosen (all possible thresholds) = 115
Getting all the cutoffs for the thresholds...
Getting number of false positives in the permutation...
'select()' returned 1:many mapping between keys and columns
Up-regulated genes in the Cl2 in Cl1 VS Cl2
Estimating sequencing depths...
Resampling to get new data matrices...
So think I got the answer from a script attached to an e-mail. Using sc <- Clustexp(sc, cln=2) and 2 clusters from then on works. As a matter of fact, that whole script works for the vignette (with minor adjustments), so I'll use it as a base for the document.

SystemsBiologist commented:

Here we go:

DataSet <- valuesG1msReduced sc <- DISCBIO(DataSet)
sc<-NoiseFiltering(sc,percentile=0.9, CV=0.2)

Normalizing the reads without any further gene filtering

sc<-Normalizedata(sc, mintotal=1000, minexpr=0, minnumber=0, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000)

sc<-FinalPreprocessing(sc,GeneFlitering="NoiseF",export = TRUE) sc<- Clustexp(sc,cln=2,quiet=TRUE) #### K-means clustering to get three clusters

cdiff <- DEGanalysis( sc, Clustering="K-means", K=2, fdr=0.1, name="Name", export=TRUE, quiet=FALSE )