statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
117 stars 25 forks source link

Ambiguity in tutorial of demuxlet #21

Closed J-sara closed 6 years ago

J-sara commented 6 years ago

Hello and thanks a lot for the nice and very practical tool.

I have a data set of single cell RNA seq data produced by 10X genomics technology. The data set is a combination of cells that are coming from two persons and I would like to separate them.

My problem is that I do not exactly know which vcf file should I use. As you know, on the 1000 genomes website, the SNPs on human genome are reported in the vcf format. In each vcf file there are variants related to around 3000 persons (in the tutorial variants related to two persons are included). Should I run demuxlet for the whole 3000 individuals?

Because my sample is composed of only two persons and if I run demuxlet with 3000 thousand individuals, in the final results I get more than two individuals for my samples which I know it is not true.

How do you suggest to solve this problem?

My second question is regarding to the genomic positions of the variants. Is it necessary to only include exons or program will also works with all the variants in all genomic positions.

Best regards

papanikos commented 6 years ago

Hello @J-sara,

Unless the two people are part of the 1000 Genomes project, then you should produce a vcf file containing variants for your two specific individuals, with another appropriate sequencing experiment (whole exome sequencing, bulk RNA-seq etc).

Keeping in mind that this is RNA-seq data, it would make sense to focus on exonic variants, since you shouldn't be sequencing much of your genome.

J-sara commented 6 years ago

Thanks!