ocbe-uio / DIscBIO

A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics
Other
12 stars 5 forks source link

New vignette hangs with reduced dataset #3

Closed wleoncio closed 4 years ago

wleoncio commented 4 years ago

Summary

Under DIscBIO version 0.0.0.9004, the new vignette hangs with the reduced dataset.

What works

The following code works fine, and is identical to the one currently present on the vignette (minus commented-out code for brevity):

## ----options, echo=FALSE------------------------------------------------------
library(knitr)
opts_chunk$set(fig.width=7, fig.height=7)

## -----------------------------------------------------------------------------
library(DIscBIO)
library(enrichR)

## -----------------------------------------------------------------------------
DataSet <- valuesG1msReduced
head(DataSet)

## -----------------------------------------------------------------------------
sc <- DISCBIO(DataSet)    

## -----------------------------------------------------------------------------
sc<-NoiseFiltering(sc,percentile=0.9, CV=0.2)

####  Normalizing the reads without any further gene filtering
sc<-Normalizedata(sc, mintotal=1000, minexpr=0, minnumber=0, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000) 

####  Additional gene filtering step based on gene expression

sc<-FinalPreprocessing(sc,GeneFlitering="NoiseF",export = TRUE) ### The GeneFiltering can be either "NoiseF" or"ExpF"

## -----------------------------------------------------------------------------
#OnlyExpressionFiltering=TRUE           
OnlyExpressionFiltering=FALSE         

if (OnlyExpressionFiltering==TRUE){
    MIínExp<- mean(rowMeans(DataSet,na.rm=TRUE))
    MIínExp
    MinNumber<- round(length(DataSet[1,])/3)    # To be expressed in at least one third of the cells.
    MinNumber
    sc<-Normalizedata(sc, mintotal=1000, minexpr=MIínExp, minnumber=MinNumber, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000) #### In this case this function is used to filter out genes and cells.
    sc<-FinalPreprocessing(sc,GeneFlitering="ExpF",export = TRUE)  
}

## -----------------------------------------------------------------------------
sc<- Clustexp(sc,cln=3,quiet=TRUE)    #### K-means clustering to get three clusters
plotGap(sc)        ### Plotting gap statisticssc<- Clustexp(sc, clustnr=20,bootnr=50,metric="pearson",do.gap=T,SE.method="Tibs2001SEmax",SE.factor=.25,B.gap=50,cln=K,rseed=17000)

## -----------------------------------------------------------------------------
sc<- comptSNE(sc,rseed=15555,quiet = TRUE)
cat("\t","     Cell-ID"," Cluster Number","\n")
sc@cpart

## -----------------------------------------------------------------------------
# Silhouette of k-means clusters
par(mar=c(6,2,4,2))
plotSilhouette(sc,K=3)       # K is the number of clusters

## -----------------------------------------------------------------------------
Jaccard(sc,Clustering="K-means", K=3, plot = TRUE)     # Jaccard of k-means clusters

## -----------------------------------------------------------------------------
############ Plotting K-means clusters
plottSNE(sc)
plotKmeansLabelstSNE(sc) # To plot the the ID of the cells in eacj cluster
plotSymbolstSNE(sc,types=sub("(\\_\\d+)$","", names(sc@ndata))) # To plot the the ID of the cells in each cluster

## -----------------------------------------------------------------------------
outlg<-round(length(sc@fdata[,1])/200)     # The cell will be considered as an outlier if it has a minimum of 0.5% of the number of filtered genes as outlier genes. 
Outliers<- FindOutliersKM(sc, K=3, outminc=5,outlg=outlg,probthr=.5*1e-3,thr=2**-(1:40),outdistquant=.75, plot = TRUE, quiet = FALSE)

RemovingOutliers=FALSE     
# RemovingOutliers=TRUE                    # Removing the defined outlier cells based on K-means Clustering

if(RemovingOutliers==TRUE){
    names(Outliers)=NULL
    Outliers
    DataSet=DataSet[-Outliers]
    dim(DataSet)
    colnames(DataSet)
    cat("Outlier cells were removed, now you need to start from the beginning")
}

## -----------------------------------------------------------------------------
sc<-KmeanOrder(sc,quiet = FALSE, export = TRUE)
plotOrderKMtsne(sc)

## -----------------------------------------------------------------------------
KMclustheatmap(sc,hmethod="single", plot = TRUE) 

## -----------------------------------------------------------------------------
g='ENSG00000000003'                   #### Plotting the expression of  MT-RNR2
plotExptSNE(sc,g)

## ----degKM--------------------------------------------------------------------
####### differential expression analysis between cluster 1 and cluster 3 of the Model-Based clustering using FDR of 0.05
cdiff <- DEGanalysis2clust(
  sc, Clustering="K-means", K=3, fdr=0.1, name="Name", export=TRUE, quiet=TRUE
)

## -----------------------------------------------------------------------------
#### To show the result table
head(cdiff[[1]])                  # The first component 
head(cdiff[[2]])                  # The second component 

What doesn't work

The next line, however, hangs:

cdiff <- DEGanalysis(
  sc, Clustering="K-means", K=3, fdr=0.1, name="Name", export=TRUE,
  quiet=FALSE
)

The last output lines before the freeze are these:

Number of thresholds chosen (all possible thresholds) = 115
Getting all the cutoffs for the thresholds...
Getting number of false positives in the permutation...
'select()' returned 1:many mapping between keys and columns
Up-regulated genes in the Cl2 in Cl1 VS Cl2
Estimating sequencing depths...
Resampling to get new data matrices...
wleoncio commented 4 years ago

So think I got the answer from a script attached to an e-mail. Using sc <- Clustexp(sc, cln=2) and 2 clusters from then on works. As a matter of fact, that whole script works for the vignette (with minor adjustments), so I'll use it as a base for the document.

SystemsBiologist commented 4 years ago

Here we go:

DataSet <- valuesG1msReduced sc <- DISCBIO(DataSet)
sc<-NoiseFiltering(sc,percentile=0.9, CV=0.2)

Normalizing the reads without any further gene filtering

sc<-Normalizedata(sc, mintotal=1000, minexpr=0, minnumber=0, maxexpr=Inf, downsample=FALSE, dsn=1, rseed=17000)

sc<-FinalPreprocessing(sc,GeneFlitering="NoiseF",export = TRUE) sc<- Clustexp(sc,cln=2,quiet=TRUE) #### K-means clustering to get three clusters

cdiff <- DEGanalysis( sc, Clustering="K-means", K=2, fdr=0.1, name="Name", export=TRUE, quiet=FALSE )