Closed WT215 closed 6 years ago
In addition, the smallest library size of the raw data is 516. So I am not sure why the lower bound is 500?
> summary(colSums(Torre_dropseq))
Min. 1st Qu. Median Mean 3rd Qu. Max.
516 5630 6938 15038 11183 953156
Here Torre_dropseq
denotes the raw data of dimension 32287 and 8640.
Cheers, Wenhao
Hi Wenhao,
Where did you obtain the data? It seems that the dataset you have is quite different:
> summary(colSums(df))
Min. 1st Qu. Median Mean 3rd Qu. Max.
74 932 1156 2593 1921 100681
Here is the filtering that I performed:
df.filt <- df[which(rowMeans(df) > 0.01),
which(colSums(df) >= 500 & colSums(df) <= 20000)]
Mo
Hi Mo,
I downloaded the drop seq data from GSE99330 (GSE99330_dropseq_counts.txt.gz).
I downloaded smFISH data from the "dropbox: https://www.dropbox.com/sh/g9c84n2torx7nuk/AABZei_vVpcfTUNL7buAp8z-a?dl=0"
These information was found in the last page of the paper:
Rare Cell Detection by Single-Cell RNA Sequencing as Guided by Single-Molecule RNA FISH
Would you mind providing me with the dropseq data and smFISH data that you used?
Thank you very much!
Best wishes,
Wenhao
发件人: Mo Huang notifications@github.com 发送时间: 2018年4月19日 23:34:36 收件人: mohuangx/SAVER 抄送: Tang, Wenhao; Author 主题: Re: [mohuangx/SAVER] A query about the detailed procedure for filtering Drop seq data (#8)
Hi Wenhao,
Where did you obtain the data? It seems that the dataset you have is quite different:
summary(colSums(df)) Min. 1st Qu. Median Mean 3rd Qu. Max. 74 932 1156 2593 1921 100681
Here is the filtering that I performed:
df.filt <- df[which(rowMeans(df) > 0.01), which(colSums(df) >= 500 & colSums(df) <= 20000)]
Mo
― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mohuangx/SAVER/issues/8#issuecomment-382781021, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQjZBdGsb5QU6BX3-AKwZQXb-KfBQzYtks5tqK6MgaJpZM4TbXlj.
After unzip the file GSE99330_dropseq_counts.txt.gz
, I run the following code to obtain the raw data Torre_dropseq:
library("data.table")
Torre_dropseq<-fread(file="/export131/home/wt215/Torre_2017/GSE99330_dropseq_counts.txt")
Torre_dropseq<-as.data.frame(Torre_dropseq)
rownames(Torre_dropseq)<-Torre_dropseq[,1]
Torre_dropseq<-Torre_dropseq[,-1]
colnames(Torre_dropseq)<-paste('Cell',seq(1,dim(Torre_dropseq)[2]),sep='')
Torre_dropseq<-as.matrix(Torre_dropseq)
Wenhao
Hi Wenhao,
For the Dropseq data, you can download 'GSE99330_dropseqUPM.txt.gz' and convert the UPM to counts as follows:
df.upm <- as.matrix(read.table("GSE99330_dropseqUPM.txt", row.names = 1, header = TRUE))
df <- sweep(df.upm, 2, apply(df.upm, 2, function(x) min(x[x!= 0])), "/")
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x - round(x)) < tol
> sum(!is.wholenumber(df))
[1] 0
df <- round(df)
This scales each cell such that the minimum expression is mapped to one. We see that every entry is a whole number within tolerance and round the output.
The FISH data can be found here: https://www.dropbox.com/s/ia9x0iom6dwueix/fishSubset.txt?dl=0
In the future, please send me an email for any questions about the manuscript as I do not consider this an issue with the software.
Hope that helps! Mo
Hi Mo,
I followed your filtering procedure as proposed in your paper: filtering genes which have mean expression less than 0.1 and cells with library size less than 500 or greater than 20000 for Torre case study.
Did you filter genes and cells simultaneously, like Data[-genes,-cells]? This procedure left me 9177 genes rather than 12241 out of 32287 genes.
May I ask how could I repeat your filtering procedure to obtain the same filtered dataset as you used in that paper?
Thank you very much!
Best wishes, Wenhao