single-cell-genetics / cellSNP

Pileup biallelic SNPs from single-cell and bulk RNA-seq data
Apache License 2.0
74 stars 11 forks source link

Wrong number of chromosomes in output file? #16

Closed VanessaOak closed 3 years ago

VanessaOak commented 3 years ago

Hello, I used a VCF file made with the mouse MM10 genome and a 10x BAM file aligned to the mouse MM10 genome. This means my output should have Chromosomes 1-19, M, X, and Y like my input files. But my output from CellSNP contains chromosomes 1-22, X, and Y. Any tips on what I can do to troubleshoot this?

hxj5 commented 3 years ago

Hello, thanks for your feedback. For now cellSNP would always output chr1-22 & X & Y as contigs into the VCF header (e.g, ##contig=), no matter it's human data, mouse data or data of some other species.

So besides the VCF header, do the VCF contain any other records of chr20-22? you could check this by zcat <output.vcf.gz> | grep -v '^#' | awk '{print $1}' | sort -u which would print all unique chromosomes (the first column) in the VCF.

BTW, could you provide the version of cellSNP (by typing cellSNP) and the command line you used for your data? Thanks

VanessaOak commented 3 years ago

Thank you for your response. It looks like there are no other records of Chr20-22 so that is good! The version is v0.3.2 Thank you!

hxj5 commented 3 years ago

That's great. We will fix the issue, making the header of output VCF compatible with the header of input VCF. Thanks!