Closed songeric1107 closed 2 years ago
Hi, thanks for your feedback. For quicker debugging, could you share the following information?
Thank you! Xianjie
thank you, XianJie.
please see below
1.cellsnp-lite Version: 1.2.2
log file sc1.log
bam sub.bam.txt
barcode AAACCTGAGAAACCAT-1 AAACCTGAGAAACCGC-1 AAACCTGAGAAACCTA-1 AAACCTGAGAAACGAG-1 AAACCTGAGAAACGCC-1
vcf
1 10539 rs537182016 C A 100 PASS OLD=10539;AF=0.000599042 1 10642 rs558604819 G A 100 PASS OLD=10642;AF=0.00419329 1 11008 rs575272151 C G 100 PASS OLD=11008;AF=0.0880591 1 11012 rs544419019 C G 100 PASS OLD=11012;AF=0.0880591 1 11063 rs561109771 T G 100 PASS OLD=11063;AF=0.00299521
Hi, these files seem all right. For the bam file, could you show some properly mapped records? e.g., print a few records whose MAPQ>20 with samtools view <bam_file> | awk '$5 > 20' | head
.
And is the bam file also hg38? as you are using a hg38 SNP list.
BTW, the running time is reasonable as you are using the raw
dataset (which contains many unfiltered cells) and a big candidate SNP list. Usually, to speed up, we recommend to use the filtered
output of cellranger (in outs/filtered_gene_bc_matrices
) and a smaller candidate SNP list (e.g., the genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz in this link, which contains 7.4M hg38 SNPs with MAF>0.05).
To check if the issue is due to the filtering parameters being too strict, you may want to run the filtered
dataset and the 7.4M snp list with --minMAF 0 --minCOUNT 1
.
Thank you for the suggestion, I change the parameter to what you suggested. one more question, for the vcf, the chromosome ID is 1,2,3....X, but the reference for alignment is keyed as chr1, chr2,....., will that be a problem?
I tried again for chr1 using the small vcf, the result is still empty. the log has no errors. cellsnp-lite -s sc1_raw/possorted_genome_bam.bam -b sc1_raw/barcodes.tsv.gz -O sc1_chr1_small -R ../ref/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf -p 20 --minMAF 0 --minCOUNT 1 --chrom chr1 --UMItag Auto --gzip
another question, my sample is 5’ scRNAseq from hashtag samples, not the regular 3' scRNA, will that matter?
Xianjie,
I figure out the problem, the bam file is not in the "right format", I should use another format of bam, cmo/outs/multi/count/unassigned_alignments.bam, I tried with the 7.5M vcf file, it works, thank you
trying to use cellsnp-lit to prepare the input for vireo, here is the example
cellsnp-lite -s sc1_raw/possorted_genome_bam.bam -b sc1_raw/barcodes.tsv.gz -O sc1_out -R ../ref/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf -p 20 --minMAF 0.1 --minCOUNT 20 --gzip
however, the whole process takes a week to finish, but the result is empty.
Not sure what is the problem?
thank you
Yang