A query about the detailed procedure for filtering Drop seq data

WT215 commented 6 years ago

Hi Mo,

I followed your filtering procedure as proposed in your paper: filtering genes which have mean expression less than 0.1 and cells with library size less than 500 or greater than 20000 for Torre case study.

Did you filter genes and cells simultaneously, like Data[-genes,-cells]? This procedure left me 9177 genes rather than 12241 out of 32287 genes.

May I ask how could I repeat your filtering procedure to obtain the same filtered dataset as you used in that paper?

Thank you very much!

Best wishes, Wenhao

WT215 commented 6 years ago

In addition, the smallest library size of the raw data is 516. So I am not sure why the lower bound is 500?

> summary(colSums(Torre_dropseq))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    516    5630    6938   15038   11183  953156

Here Torre_dropseq denotes the raw data of dimension 32287 and 8640.

Cheers, Wenhao

mohuangx commented 6 years ago

Hi Wenhao,

Where did you obtain the data? It seems that the dataset you have is quite different:

> summary(colSums(df))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     74     932    1156    2593    1921  100681

Here is the filtering that I performed:

df.filt <- df[which(rowMeans(df) > 0.01), 
                  which(colSums(df) >= 500 & colSums(df) <= 20000)]

Mo

WT215 commented 6 years ago

Hi Mo,

I downloaded the drop seq data from GSE99330 (GSE99330_dropseq_counts.txt.gz).

I downloaded smFISH data from the "dropbox: https://www.dropbox.com/sh/g9c84n2torx7nuk/AABZei_vVpcfTUNL7buAp8z-a?dl=0"

These information was found in the last page of the paper:

Rare Cell Detection by Single-Cell RNA Sequencing as Guided by Single-Molecule RNA FISH

Would you mind providing me with the dropseq data and smFISH data that you used?

Thank you very much!

Best wishes,

Wenhao

发件人: Mo Huang notifications@github.com 发送时间: 2018年4月19日 23:34:36 收件人: mohuangx/SAVER 抄送: Tang, Wenhao; Author 主题: Re: [mohuangx/SAVER] A query about the detailed procedure for filtering Drop seq data (#8)

Hi Wenhao,

Where did you obtain the data? It seems that the dataset you have is quite different:

summary(colSums(df)) Min. 1st Qu. Median Mean 3rd Qu. Max. 74 932 1156 2593 1921 100681

Here is the filtering that I performed:

df.filt <- df[which(rowMeans(df) > 0.01), which(colSums(df) >= 500 & colSums(df) <= 20000)]

Mo

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mohuangx/SAVER/issues/8#issuecomment-382781021, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQjZBdGsb5QU6BX3-AKwZQXb-KfBQzYtks5tqK6MgaJpZM4TbXlj.

WT215 commented 6 years ago

After unzip the file GSE99330_dropseq_counts.txt.gz, I run the following code to obtain the raw data Torre_dropseq:

library("data.table")

Torre_dropseq<-fread(file="/export131/home/wt215/Torre_2017/GSE99330_dropseq_counts.txt")
Torre_dropseq<-as.data.frame(Torre_dropseq)
rownames(Torre_dropseq)<-Torre_dropseq[,1]
Torre_dropseq<-Torre_dropseq[,-1]
colnames(Torre_dropseq)<-paste('Cell',seq(1,dim(Torre_dropseq)[2]),sep='')
Torre_dropseq<-as.matrix(Torre_dropseq)

Wenhao

mohuangx commented 6 years ago

Hi Wenhao,

For the Dropseq data, you can download 'GSE99330_dropseqUPM.txt.gz' and convert the UPM to counts as follows:

df.upm <- as.matrix(read.table("GSE99330_dropseqUPM.txt", row.names = 1, header = TRUE))
df <- sweep(df.upm, 2, apply(df.upm, 2, function(x) min(x[x!= 0])), "/")
is.wholenumber <- function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol
> sum(!is.wholenumber(df))
[1] 0          
df <- round(df)

This scales each cell such that the minimum expression is mapped to one. We see that every entry is a whole number within tolerance and round the output.

The FISH data can be found here: https://www.dropbox.com/s/ia9x0iom6dwueix/fishSubset.txt?dl=0

In the future, please send me an email for any questions about the manuscript as I do not consider this an issue with the software.

Hope that helps! Mo

mohuangx / SAVER

A query about the detailed procedure for filtering Drop seq data #8