Open lmweber opened 4 years ago
Just following up on this - I have been recommended this tool sam-xlate
, which can convert a BAM from genomic to transcriptomic coordinates: https://github.com/mozack/ubu/wiki
However it isn't clear to me yet whether it can also go the other way - transcriptomic to genomic coordinates, which is what would be needed to run cellSNP on the salmon alevin
output. Will try it out though and see if this works.
Hi, a quick answer is that if you can convert the genome based VCF into transcriptome based VCF, cellSNP may work straightforward (in mode 1). This means you need to change to chromosome id into transcript id and SNP position on the transcript. Also, keep an eye on the CB and UMI barcodes, as I'm not sure if salmon works similarly as STAR in cellranger.
Oh cool, thanks for this idea, I'll look into this. I am already manipulating the vcf
file to keep only variants in 3' UTR regions (this greatly speeds up runtime, for only a small drop in performance - using some code originally from Pete Hickey), so it may be possible to add this somehow.
Yes, the cell barcodes and UMI barcodes were also an issue. We asked the salmon maintainers (Avi Srivastava) about this, and Avi has kindly added an update in the latest development version of salmon, which includes the barcodes (CB and UR fields). (This discussion was in the Bioconductor Slack, #singlecell-queries
channel, if you are interested in joining it.)
Thanks for this idea. Unfortunately, I don't think it worked in the end. I used Ensembl Variant Effect Predictor (VEP) tool to get the transcriptome ID and position for each variant, and then rebuilt the vcf
file to have transcript ID in column #CHROM
, and transcriptome position in POS
. But cellSNP was still giving me partial errors like the following, and the output files are empty, so I think it is not able to match the transcripts in the BAM file with this setup.
22.00% positions processed.
24.00% positions processed.
Can't find references chrENST00000558905.1 in samFile
Warning: chrom is None
26.01% positions processed.
28.01% positions processed.
So unfortunately it seems it may not be possible to use cellSNP
with alevin
for now. I have tried both of the following options:
alevin
to genomic coordinates using sam-xlate
(which doesn't seem to work since sam-xlate
only converts in the other direction)VariantAnnotation
packageIf you have any other ideas, please let me know, since it would be really useful for us if we could somehow use cellSNP
/ Vireo
together with alevin
. For now, I think I will go back to using Cell Ranger
instead of alevin
in our pipeline, since this does work (but Cell Ranger
is much slower and has issues with multi-mapping reads, which is why we prefer alevin
).
Thanks again for your help.
Hi, thanks for your previous responses a while back.
I am now trying to use cellSNP/Vireo with a SAM/BAM output file from
salmon alevin
instead of Cell Ranger, and wanted to see if this would be feasible somehow.The main issue seems to be that
salmon alevin
maps to the transcriptome and creates a transcriptomic BAM, while Cell Ranger maps to the genome and creates a genomic BAM. Since cellSNP expects a genomic BAM, it then gives me errors due to not finding chromosome references in the transcriptomic BAM.I just checked with the
salmon alevin
authors on Slack this morning to find out more about this - I believe the chromosome reference is normally stored in theRNAME
field in the genomic BAM, but in the transcriptomic BAM this field contains the transcript name. The genomic/transcriptomic coordinates are also different.Do you know if there would be a way to run cellSNP directly on a transcriptomic BAM, or convert this internally somehow? Alternatively, the
salmon alevin
authors pointed me towards some tools to directly convert transcriptomic to genomic BAM, which could be a way around this - but I thought I would check here first.The main reason this has come up is that we tend to have various issues with Cell Ranger (slow runtime, memory problems, multi-mapping reads), so we would prefer to use
salmon alevin
in our pipelines, and were wondering if this has already been considered. Thanks!