Closed apredeus closed 2 years ago
Hi Alex,
Thanks for the feedback. Updating the file in place may cause confusion for users who use older version of reference conventions. Adding new reference VCF file for new conventions is an option, but it could be redundant if there are many versions of conventions. Alternatively, "bcftools annotate --rename-chrs" tool could be used to update the chromosome names.
Cheers, Xianjie
Thank you! No worries, just wanted to give a heads-up. Thanks for pointing out the bcftools
method too, didn't know about it!
Hi @hxj5 - just wanted to ask a quick question.
I just ran the conversion of the reference VCF (which quite clearly worked - all main chromosomes are now renamed to chr1-chr22), and then ran cellSNP
in "mode 1" (using this reference VCF) with a BAM file with the same naming convention (chr1 etc)
The results, however, are empty - and the header of cellSNP.cells.vcf.gz
looks like this:
##fileformat=VCFv4.2
##source=cellSNP_v0.3.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=.,Description="Filter info not available">
##INFO=<ID=DP,Number=1,Type=Integer,Description="total counts for ALT and REF">
##INFO=<ID=AD,Number=1,Type=Integer,Description="total counts for ALT">
##INFO=<ID=OTH,Number=1,Type=Integer,Description="total counts for other bases from REF and ALT">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="total counts for ALT and REF">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="total counts for ALT">
##FORMAT=<ID=OTH,Number=1,Type=Integer,Description="total counts for other bases from REF and ALT">
##FORMAT=<ID=ALL,Number=5,Type=Integer,Description="total counts for all bases in order of A,C,G,T,N">
##contig=<ID=1>
##contig=<ID=2>
##contig=<ID=3>
##contig=<ID=4>
##contig=<ID=5>
##contig=<ID=6>
##contig=<ID=7>
##contig=<ID=8>
##contig=<ID=9>
##contig=<ID=10>
##contig=<ID=11>
##contig=<ID=12>
##contig=<ID=13>
##contig=<ID=14>
##contig=<ID=15>
##contig=<ID=16>
##contig=<ID=17>
##contig=<ID=18>
##contig=<ID=19>
##contig=<ID=20>
##contig=<ID=21>
##contig=<ID=22>
##contig=<ID=X>
##contig=<ID=Y>
Do you have any idea how it might have happened? Can't seem to find the way to debug this... Thank you in advance!
P.S. My cellSNP
is version 0.3.2 installed with pip
.
Hi, for the "empty" VCF, could you share the log information, especially the error message?
I was also wondering what the type of the data is. Is it scRNA, or scDNA/scATAC? sometimes the empty output is due to improper setting of --UMItag
for scDNA/scATAC data.
The data is single nucleus RNA-seq aligned with STARsolo
. There are no obvious errors in the log file, here's the concatenated version:
Processing sample Pla_HDBR11923125..
[cellSNP] mode 1: fetch given SNPs in 67849 single cells.
[cellSNP] loading the VCF file for given SNPs ...
[cellSNP] fetching 7352497 candidate variants ...
2.00% positions processed.
...............................
100.00% positions processed.
[cellSNP] fetched 7352497 variants, now merging temp files ...
[cellSNP] 38 lines in final vcf file
[cellSNP] All done: 376 min 45.6 sec
Thanks for the information. seems no obvious error. Could you share your command line and a few lines of 1) the BAM records 2) the VCF file and 3) the barcode file?
I've extracted the bam file of chromosome 22 of my sample. Here are the links to the bam file, barcode list, and the reference VCF:
https://www.dropbox.com/s/eynica8ftbzsdc4/barcodes.tsv https://www.dropbox.com/s/69p16aucuno39p0/chr22.bam https://www.dropbox.com/s/nd4oiiw9adzlwne/sorted.vcf.gz
Thank you for your help!
Command line was as follows:
cellSNP -s $TAG.bam -b $TAG.cut2k_barcodes.tsv -O $TAG.cellsnp.out -R $REF -p 16 --minMAF 0.1 --minCOUNT 20
where REF
is the reference VCF file.
Hi, seems the bam file does not contain the UR
tag, which is the default UMI tag when setting --UMItag Auto
and barcode file is given. You may try setting --UMItag UB
as the bam file contains the UB
tag.
Thank you very much! UB
is the tag for error-corrected UMI, while UR
is one for raw. I guess that's the difference between Cell Ranger
and STARsolo
-generated BAM files.
I'll let you know if it worked.
It did! 👍 (Totally forgot about this discussion.)
Thanks again for your help.
Hi all,
starting with Cell Ranger reference 2020-A, chromosome naming now follows Gencode convention (chr1-chr23, chrM) instead of ensembl style (1-23, MT). It's probably worth it to update the reference VCF files provided since it'll definitely cause confusion among users.
Cheers
-- Alex