single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Blank cellSNP.base.vcf #42

Closed alanfoleynibrt closed 2 years ago

alanfoleynibrt commented 2 years ago

Hi I am trying to use cellsnp-lite to analyse scRNAseq data from the BD Rhapsody.

I put my fastqs though the BD Rhapsody™ WTA Analysis Pipeline Revision 6.

It outputted a BAM file with sorting based on Cell barcode (I have 5500 cells). I split the BAM file into just "MT" chromosome reads.

When I try to analyse this MT BAM file with cellsnp-lite I run this:

"/home/alan/AbSeq/SNV/MQuad/cellsnplite/cellsnp-lite/cellsnp-lite" \ -s "/home/alan/AbSeq/seven_bridges_output/yujuanjustmt.bam" \ -O /home/alan/AbSeq/SNV/MQuad/cellsnplite/ \ -b "/home/alan/AbSeq/seven_bridges_output/barcodes2.tsv" \ -p 20 \ --cellTAG CB \ --chrom MT

it runs with:

[I::main] start time: 2022-05-19 15:00:01 [W::check_args] Max depth set to maximum value (2147483647) [I::main] mode 2a: pileup 1 whole chromosomes in 5581 single cells. [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog! [I::csp_pileup_core][Thread-0] processing chrom MT ...

but the output is an empty cellSNP.base.vcf file.

The BAM file should have cell barcodes (which I have provided) and UMIs.

Is there something I'm doing wrong?

When I run with "--UMItag None" I get a cellSNP.base.vcf file with a list of variants at every single nucleotide position - which must be incorrect.

When I run with "--UMItag MR" which I believe is the UMItag for my BAM file, again I get a huge number of variants at all positions.

Many thanks

Alan

hxj5 commented 2 years ago

Hi, --UMItag None means cellsnp-lite will not use UMI information and will parse each read independently, it can still pileup a list of variants like in the case of 10x scDNA data. As you have already obtained the variant list with --UMItag MR, you may filter part of them with bcftools or by rerunning cellsnp-lite with proper --minCOUNT and --minMAF (default values are 20 and 0, respectively) if you want.

alanfoleynibrt commented 2 years ago

Thank you for your reply

I'm having a strange issue now.

I run this:

"/home/alan/AbSeq/SNV/MQuad/cellsnplite/cellsnp-lite/cellsnp-lite" \ -s "/home/alan/AbSeq/yujuanKX/sevenbridgesoutput/gatkbams/45712HENGLI.bam" \ -O "/home/alan/AbSeq/yujuanKX/sevenbridgesoutput/gatkbams/mquadtest/" \ -b "/home/alan/AbSeq/SNV/MQuad/kxcellsnplite/barcodeskx.tsv" \ -p 20 \ --cellTAG CB \ --UMItag None \ --chrom KX576660.1 \ --minMAF 0.02 \ --minMAPQ 5

and I have an output of 1419 SNPs which are missing certain variants. For example, I know from viewing my 45712HENGLI.bam in IGV there is a variant at 2235 C>T of allele depth 18 and sequencing depth 18.

However, when I view the base.vcf file I see:

KX576660.1 2219 . C A . PASS AD=1;DP=20;OTH=0 KX576660.1 2221 . T A . PASS AD=1;DP=19;OTH=1 KX576660.1 2230 . G T . PASS AD=2;DP=20;OTH=0 KX576660.1 2231 . A T . PASS AD=1;DP=20;OTH=0 KX576660.1 2412 . T A . PASS AD=2;DP=20;OTH=0 KX576660.1 2414 . G T . PASS AD=1;DP=22;OTH=0

The reference genome is correct, and the variant calls at the other positions are correct.

But a major variant at 2235 is missing.

This is strange because when I run mpileup directly:

samtools mpileup \ "/home/alan/AbSeq/yujuanKX/sevenbridgesoutput/gatkbams/45712HENGLI.bam" \ -f "/home/alan/AbSeq/SNV/mpileup/KX576660.fasta" \ -o /home/alan/AbSeq/yujuanKX/sevenbridgesoutput/gatkbams/mquadtest/mpileupoutput

I see that the variant at 2235 is present:

KX576660.1 2233 t 19 ..$................. F;JIFFJFFJ:FJJFFFF: KX576660.1 2234 t 17 ...-1C.............. FJHFFJFFJFFJJFFFF KX576660.1 2235 c 19 TT*TTTTTTTTTTTTTTTT FJJFFJFFFJFFJJFFFFF KX576660.1 2236 g 19 ................... FJJFFJFF:JFFJJFF:FF KX576660.1 2237 g 18 .................. FJJFFJ:FJFFJJFFFFF

Can you advise on what is happening here?

Alan

hxj5 commented 2 years ago

The "missing" variants were probably filtered by --minCOUNT or --minMAF (default values are 20 and 0, respectively). The variants whose "DP+OTH < minCOUNT" or "min(AD, DP-AD) < DP * minMAF" would be filtered. In the case of pos 2235, it was expected that the variant would be filtered (depth 19 is smaller than minCOUNT 20; minor allele fraction 0 is smaller than minMAF 0.02 as you specified). You may try using looser filtering parameters to "call" more variants.

Xianjie

alanfoleynibrt commented 2 years ago

Many thanks Xianjie!

This did fix it