single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Application to Slide-seq V2 returns empty vcf #91

Closed kzb193 closed 1 year ago

kzb193 commented 1 year ago

Hello,

I tried to apply cellsnp-lite to a slide-seq V2 based bam file, and the resulting vcf was empty. The bam file was obtained by using spacemake ( https://doi.org/10.1093/gigascience/giac064 ) on to the pair of slide-seq V2 fastq files. Specifically, it returns final.polyA_adapter_trimmed.bam: final, mapped, tagged bam file where CB tag contains the cell barcode, and the MI tag contains the UMI-s.

Can you please help ?

Here are some additional details. command: cellsnp-lite -s $BAM_SORTED_OUT -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 20 --minCOUNT 20 --UMItag MI

vcf file (fastqs were aligned using hg19): genome1K.phase3.SNP_AF5e2.chr1to5.hg19.sorted.vcf.gz

lines from the bam file: samtools view $BAM_SORTED_OUT | awk '$5 > 20' | head

SRR16203708.2 0 GL000220.1 118318 255 54M 0 0 ATTCGTAGACGACCTGCTTCTGGGTCGGGGTTTCGTACGTAGCAGAGCAGCTCC FFFFFFFFFFFFFFFFF,FF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF CB:Z:NAGTGTCGAGGGCT MD:Z:54 XF:Z:CODING RG:Z:A NH:i:1 HI:i:1 MI:Z:GACTTTCAA jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 CR:Z:NAGTGTCGAGGGCT AS:i:53 gf:Z:CODING gn:Z:RNA28S5 gs:Z:+ SRR16203708.11 16 4 22386473 255 55M 0 0 TTGCTTCTCTATAATGAGACCTATGTATAGATTCCAATCAGCAACTAAACCTAAG FFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF CB:Z:NCCTCAGCTCTTTG MD:Z:55 XF:Z:INTRONIC RG:Z:A NH:i:1 HI:i:1 MI:Z:ATGATTTAG jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 CR:Z:NCCTCAGCTCTTTG AS:i:54 gf:Z:INTRONIC gn:Z:GPR125 gs:Z:- SRR16203708.12 16 11 17096690 255 56M 0 0 CCAAACGGTGAATCCGGCTCTCTATTAGAATCAGACGGAATTTAGCATCCTTATCC FFFF:FFFFFFF:FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFF CB:Z:NTAGATCATCCTAG MD:Z:56 XF:Z:CODING RG:Z:A NH:i:1 HI:i:1 MI:Z:ATATCAATT jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 CR:Z:NTAGATCATCCTAG AS:i:55 gf:Z:CODING gn:Z:RPS13 gs:Z:- SRR16203708.13 0 11 73965649 255 2S53M1S 0 0 CATCCCACCCCTGTCCTGCTGCATGGTCCGGAGTCTGGGACCTACTTTGTTTTTTT F,F,FFFFFFF,,FFF,:FFFFFFFF,::FFFFFF,F:FF:FFFFFF:F,FF::F, CB:Z:NCCATACTTCTTCA MD:Z:53 XF:Z:CODING RG:Z:A NH:i:1 HI:i:1 MI:Z:GCGTTCCCG jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 CR:Z:NCCATACTTCTTCA AS:i:52 gf:Z:INTRONIC,CODING gn:Z:P4HA3,PPME1 gs:Z:-,+ SRR16203708.15 0 1 214498766 255 56M 0 0 TCCCCCCAGCCTCCAAATTAATCCACATTGTAGATAAGTTCTATCCAGAGGGAGGT F,FFF,FFFFF:FFFFF:FF:FFFF,:F,F::FFF:FFFFFF:FFF:FFFFFFF,F CB:Z:NCTGACCTCTTTTT MD:Z:56 XF:Z:INTRONIC RG:Z:A NH:i:1 HI:i:1 MI:Z:CTCCTGCCT jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 CR:Z:NCTGACCTCTTTTT AS:i:55 gf:Z:INTRONIC gn:Z:SMYD2 gs:Z:+ SRR16203708.19 0 GL000220.1 117874 255 55M 0 0 GGGATTAGACCGTCGTGAGACAGGTTAGTTTTACCCTACTGATGATGTGTTGTTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF CB:Z:NAACTAGCTCTTCA MD:Z:3T51 XF:Z:CODING RG:Z:A NH:i:1 HI:i:1 MI:Z:GCCAAGCAC jI:B:i,-1 NM:i:1 jM:B:c,-1 nM:i:1 CR:Z:NAACTAGCTCTTCA AS:i:52 gf:Z:CODING gn:Z:RNA28S5 gs:Z:+ SRR16203708.20 0 10 112361465 255 5S43M 0 0 CTAACAAATATGGAAAAAGAACATATGGATGCTATAAATCATGATACT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF CB:Z:NAGCATTATCTTCA MD:Z:43 XF:Z:CODING RG:Z:A NH:i:1 HI:i:1 MI:Z:GCGTTCCCG jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 ZP:i:49 CR:Z:NAGCATTATCTTCA AS:i:42 gf:Z:CODING gn:Z:SMC3 gs:Z:+ SRR16203708.21 16 X 73441879 255 56M 0 0 AATGCCACTTAATCTAATATGTTGAGCTAGTATCAATTAACTTTACACTACACAGT ,FFFFF,FFF,FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFF:FFFFFFFFF:FFF CB:Z:NGTGTGGTTCTTCA MD:Z:56 XF:Z:INTRONIC RG:Z:A NH:i:1 HI:i:1 MI:Z:GCGTTCCCG jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 CR:Z:NGTGTGGTTCTTCA AS:i:55 gf:Z:INTRONIC,INTRONIC gn:Z:FTX,RP3-368A4.5 gs:Z:-,- SRR16203708.24 16 12 56578704 255 17M93N39M 0 0 CGAGACAGGCAATTATTCTGCACCAAGGACTTCTCAATGGTCATAAACATTTCCAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFF,FFFFF CB:Z:NGTAGCAATCTTCA MD:Z:56 XF:Z:CODING RG:Z:A NH:i:1 HI:i:1 MI:Z:GCGTTCCCG jI:B:i,56578721,56578813 NM:i:0 jM:B:c,22 nM:i:0 CR:Z:NGTAGCAATCTTCA AS:i:56 gf:Z:INTRONIC,CODING gn:Z:RP11-977G19.5,SMARCC2 gs:Z:-,- SRR16203708.25 16 14 50085645 255 49M5S 0 0 CCTTCTTCCGGAAAATTGGCTTTGTCTGCCCACCATAGCCACTCTGCTTGATAC FFFFFFFFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFF CB:Z:NACGTTGATCTTCA MD:Z:49 XF:Z:CODING RG:Z:A NH:i:1 HI:i:1 MI:Z:GCTTTTTTT jI:B:i,-1 NM:i:0 jM:B:c,-1 nM:i:0 CR:Z:NACGTTGATCTTCA AS:i:48 gf:Z:CODING gn:Z:RPL36AL gs:Z:-

hxj5 commented 1 year ago

Hi, thanks for the feedback and detailed information. The cmdline and input files look fine. However, it seems all CB tags start with 'N', which is not commonly observed in 10x scRNA-seq data, could you double check whether the barcodes in $BARCODE file are exactly matched with CB tag, including whether they have "-1" suffix or not?

kzb193 commented 1 year ago

Hello,

Thank you very much for pointing out that anomaly. I will look into the details, and get back with updates in a day or two.

kzb193 commented 1 year ago

Hello,

There were some issues with the bam file as you had correctly pointed out. Applying cellsnp-lite on the bam file, obtained directly from the research group that ran the experiment, resolved the issue of getting an empty vcf file. Thank you very much.