statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
117 stars 25 forks source link

vcf file with all individual information for a pooled sample #26

Closed x811zou closed 6 years ago

x811zou commented 6 years ago

Hi! In the demuxlet paper, I did not find the place commenting on how did you make the .vcf file for the 8 individuals. Could you specify it a bit? Thanks!

HichamAffia commented 5 years ago

@x811zou Hi, did you find a way? I have a sample containing cells from different individual (singlecell RNA seq). I wonder if my vcf output run on the bam file while contain several genotypes or one hybride genotype? Thanks!

hyunminkang commented 5 years ago

Not sure what you're talking about, but you need an external genotype data.

On Thu, Feb 21, 2019 at 1:06 PM HichamAffia notifications@github.com wrote:

@x811zou https://github.com/x811zou Hi, did you find a way? I have a sample containing cells from different individual (singlecell RNA seq). I wonder if my vcf output run on the bam file while contain several genotypes or one hybride genotype? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/26#issuecomment-466103842, or mute the thread https://github.com/notifications/unsubscribe-auth/AF-OuTAct30JSKfY3utE_swBNvjJdntDks5vPuAugaJpZM4U6K6R .

HichamAffia commented 5 years ago

Thanks for the answer, my question is can I run the variant call on a bam file containing sequences from 3 individuals? Will the VCF file output have 3 different variant profile or one hybrid profile. Sorry if the question was confusing, hope it clarifies!

hyunminkang commented 5 years ago

You need VCF file from SNP array (typical), or you could use VCF from sequence data, but focus on polymorphic sites in 1000G with MAF >1% or so focusing on exome regions to avoid false positives.

On Thu, Feb 21, 2019 at 2:29 PM HichamAffia notifications@github.com wrote:

Thanks for the answer, my question is can I run the variant call on a bam file containing sequences from 3 individuals? Will the VCF file output have 3 different variant profile or one hybrid profile. Sorry if the question was confusing, hope it clarifies!

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/26#issuecomment-466133752, or mute the thread https://github.com/notifications/unsubscribe-auth/AF-OucFkUG7_LNXd3wM7bv7Lq7RVvoOaks5vPvOWgaJpZM4U6K6R .

HichamAffia commented 5 years ago

We may consider a SNP array in the future, but that would not be soon enough. If I could just ask for further clarification, what if my individuals are not in the 1000G. I should be able to build my own VCF from my sequencing data. I also have the BAMs for the the individual from sample that were not pooled and from which I could create differents VCF files. Can I feed demuxlet different vcf file or could I just merge them? Thanks!

kmuench commented 5 years ago

Hello, I think I have this issue as well. I have a single 10x .bam file, which contains 5 multiplexed samples, and then five different .vcf files (one for each of the samples, generated through GATK from RNA-Seq data). Should we use something like GATK's CombineVariants to make these 5 files into 1 .vcf file? Or is there a different, preferred technique?

HichamAffia commented 5 years ago

Yes you should merge them for demuxlet, I used monovar successfully, but that is for single cell, it’s a wrap that uses samtools.

Le 22 mars 2019 à 15:23, kmuench notifications@github.com a écrit :

Hello, I think I have this issue as well. I have a single 10x .bam file, which contains 5 multiplexed samples, and then five different .vcf files (one for each of the samples, generated through GATK from RNA-Seq data). Should we use something like GATK's CombineVariants to make these 5 files into 1 .vcf file? Or is there a different, preferred technique?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

fangfang0906 commented 3 years ago

Hello, I think I have this issue as well. I have a single 10x .bam file, which contains 5 multiplexed samples, and then five different .vcf files (one for each of the samples, generated through GATK from RNA-Seq data). Should we use something like GATK's CombineVariants to make these 5 files into 1 .vcf file? Or is there a different, preferred technique?

Hi, I have the same issue. Could I know how to generate different vcf files using the single merged bam file? Thank you!