vanrooij-lab / scRNAseq-HCM-human

1 stars 0 forks source link

Correct SORT-seq data #1

Open pakiessling opened 5 days ago

pakiessling commented 5 days ago

Hi @Jintram,

we are interested in including the scRNA data that you generated for your publication:

Single-cell transcriptomics provides insights into hypertrophic cardiomyopathy

in our analysis.

We have accessed the data deposited in GSE138262 but have some doubts about what file to include in our downstream work.

There appear to be 3 files per plate: TranscriptCounts.tsv.gz, ReadCounts.tsv.gz and BarcodeCounts.tsv.gz

Based on the name I would assume that TranscriptCounts would contain the UMI we are after, however the file content seems to be composed of floats like 1.000122 2.000488 and 6.004399

Could you please confirm what file we should use for UMI based analysis?

Thank you!

Jintram commented 1 day ago

Hello pakiessling,

Thanks for your interest!

Good question. Unfortunately, it seems that the count tables deposited are somewhat older than the ones used for the latest analysis. So thank you for bringing my attention to this.

I am currently in the middle of two busy weeks, so I will look into this next week.

Briefly,

  1. The files you are talking about are from a previous mapping pipeline, that we didn't use in the end. The counts that are determined there are:
    • ReadCounts: Raw counts, without UMI correction (integer number).
    • BarcodeCounts: UMI counts (integer).
    • TranscriptCounts: UMI counts with additional correction for the fact that there are only 4^6 = 4096 UMIs possible, which leads to UMI redundancy for genes with very high counts. This can be corrected for, which is done, which is why this parameter contains float numbers.
  2. I will next week try to make sure (a) you get the right files (please send me your contact details at m.wehrens[AT]uva.nl), (b) to upload the correct files into a repository.

All the best, Martijn