Closed sbamopoulos closed 1 year ago
Hi Stefan,
thanks for the feedback and detailed information. The command line you provided indeed can not output one column per barcode in the VCF, but it would output the sparse matrices (AD, DP) which I suppose are sufficient for Numbat modelling. If you still want to obtain one column per barcode VCF, you may run cellsnp-lite adding the parameter --genotype
, which would output an additional VCF file cellSNP.cells.vcf
containing the per-barcode information.
Best Xianjie
Hi Xianjie,
thank you for your speedy reply and apologies for my late response. I do not require a column per barcode, this was an error on my part. The script I used fails further downstream, due to another issue. You can close this issue.
Best Stefan
Hello cellsnp-lite team,
I am trying to run a preprocessing script of the numbat R package (pileup_and_phase.R), which internally calls cellsnp-lite with the following command:
cellsnp-lite -s file.BAM -b barcodes.txt -O pileup/01 -R test.vcf -p 32 --minMAF 0 --minCOUNT 2 --UMItag MR --cellTAG CB
The test vcf file contains the first 1000 snps from genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf (just for testing):
The output cellSNP.base.vcf looks like this:
fileformat=VCFv4.2
CHROM POS ID REF ALT QUAL FILTER INFO
chr1 629218 . A G . PASS AD=3;DP=70;OTH=0 chr1 629482 . T C . PASS AD=0;DP=2;OTH=0 chr1 629626 . T C . PASS AD=0;DP=2;OTH=0 chr1 629906 . C T . PASS AD=7;DP=6689;OTH=21 chr1 630026 . C T . PASS AD=1;DP=6;OTH=0 chr1 630084 . T C . PASS AD=0;DP=4;OTH=0 chr1 630110 . T C . PASS AD=2;DP=2;OTH=1 chr1 630128 . G A . PASS AD=0;DP=3;OTH=0
I am assuming that there should be one column per cell barcode, which is not the case
Some details to the BAM file used, sequencing was done on the BD Rhapsody platform and alignment with STAR solo. It is a multiplexed BAM file, where CB:Z: Tag denotes cell barcodes, ST:Z denotes the samples (01-12) and MR the UMI. Aligment was done on Ensembl GRCh38 (GENCODE 29)
The barcode is a plain text file that has one barcode per line (they are numbers in BD Rhapsody), like so: 850570 243761 39999 360647 589619
Is there a specificiation for the BAM or barcode file that is not apparent to me through reading the documentation? I went over the cellsnp code, which is in python and a little more readable for me and it seems that if cell barcodes are provides the sample IDs are skipped and I would expect one column per barcode in the vcf file. However this is not the case. Does the cell barcode file need a specific format? Does cellsnp-lite expect a TAG that is not defined in my BAM file?
Any and all help is greatly appreciated!
Best Stefan