Closed Raina-M closed 2 years ago
Hi Meng,could you check jf the barcode tag is CB in your bam?-------- Original message --------From: Meng Zhang @.>Date: Sat, Jul 2, 2022, 03:10To: ruqianl/appendCB @.>Cc: Subscribed @.***>Subject: [ruqianl/appendCB] Empty output bam file (Issue #1) Hello, I ran the following command: appendCB/src/appendCB --append AAACCCACACACACTA Aligned.sortedByCoord.out.bam AAACCCACACACACTA.tagged.bam
but the output bam AAACCCACACACACTA.tagged.bam is empty (17kb only with headers) My input bam Aligned.sortedByCoord.out.bam is around 5Mb. I did not encounter any error or warning messages. Could you please give me a clue why this happened?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
No, I am afraid not. I have a merged BAM file from all single cells generated from cellranger
, but I cannot use this BAM as input of sgcocaller
(I supposed you know this tool, also from your group) since this BAM contains cells from two species. So I split the BAM to thousands of small bam
files. Each bam
corresponds to a single cell. And I only used bam
files from one species. Considering some alignment errors, I converted the bam
to fastq
file and re-aligned to reference genome. I guess I lost the barcode information during the realignment step.
Hi @Raina-M ,
If you have the list of cell barcodes from each species known, you wouldn't need to split the BAM into small bams. Say you have a barcodeFileS1.txt that contains the list of CB from species 1, and you can use that as the input for sgcocaller. sgcocaller will only look at DNA reads from these cell barcodes. The same for the second species.
Best, RL
If you have the list of cell barcodes from each species known
This is exactly the problem I am trapped in. I know the CB of species 1 from cellranger
outputs, which derived clusters based on gene expression. So I selected the cluster with species 1 genes and extracted a list of CBs belonging to this cluster. Then I pulled the alignments with these CBs from a single big BAM
. For now, everything looks perfect. I should have got a BAM
of species 1 alignments.
However, when I examined this BAM
file, I found some mapping targets were species 2 due to the similarity between these two species. That was why I converted BAM
to fastq
and then realigned only to species 1 genome.
you wouldn't need to split the BAM into small bams
You are right, for scgocaller
, I do not need to do this. Splitting to single-cell BAM
is because we need some single-cell analysis later. But this is not relevant to my current problem. Even not doing splitting, I still lose the CB information in species 1 BAM
.
Since I already generated the BAM
for species 1, I am thinking of using appendCB
to add barcodes, but I do not why it only output an empty BAM
.
Hi @Raina-M ,
When you converted the BAM to fastq to do re-mapping, you probably lost the CB tags to all the reads? Perhaps try using options from samtools fastq
that lift the tags to the sequence names
-t
Copy RG, BC and QT tags to the FASTQ header line, if they exist.
-T TAGLIST
Specify a comma-separated list of tags to copy to the FASTQ header line, if they exist.
Again, I think it is probably the CB tags are missing from the reads in the BAM so appendCB
is not generating any reads. I think you could try the --generate
option that generates CB tags for reads without a CB tag already using the provided string via the --append
option.
Hope it is helpful. I have a new version of the appendCB
tool which should be a bit easier to work with, and will share it very soon.
Thank you for the options, but I am afraid that -t
or -T
flag in samtools fastq
did not work in my case. Even though I added CB in fastq
headers, the aligner still removed this information after mapping, and there is no rescue option to keep it.
Anyway, I already manually added CB attributes to the bam
since I have one bam
for one cell, which means the barcode is the same for one bam
.
samtools view --no-header mybam.bam | awk -v cb=$CB '{print $0"\tCB:Z:"cb}' > CB_tagged.sam
Then I just added the bam
header to CB_tagged.sam
and converted it to bam
. I think this should fulfill the function of appendCB
, right? Or should I rerun appendCB
on my output?
Hi Meng,Since you now have a CB tag in each of your single cell bams, you can merge them together into a merged bam and run sgcocaller on all cells together.BTW minimap2 has an option to copy over the comments in the fastq read header to the aligned reads.-------- Original message --------From: Meng Zhang @.>Date: Tue, Jul 19, 2022, 23:46To: ruqianl/appendCB @.>Cc: ruqianl @.>, Comment @.>Subject: Re: [ruqianl/appendCB] Empty output bam file (Issue #1) Thank you for the options, but I am afraid that -t or -T flag in samtools fastq did not work in my case. Even though I added CB in fastq headers, the aligner still removed this information after mapping, and there is no rescue option to keep it. Anyway, I already manually added CB attributes to the bam since I have one bam for one cell, which means the barcode is the same for one bam. samtools view --no-header mybam.bam | awk -v cb=$CB '{print $0"\tCB:Z:"cb}' > CB_tagged.sam
Then I just added the bam header to CB_tagged.sam and converted it to bam. I think this should fulfill the function of appendCB, right? Or should I rerun appendCB on my output?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Thanks for the reply. I think I can move on to sgcocaller
with the tagged BAM
now. As for mininmap2
, I do not think it is an optimal aligner for scRNA-seq data. Anyway, I still encountered some errors when using this merged BMA
for sgcocaller
, but I will post an issue on sgcocaller
GitHub page.
Hello,
I ran the following command:
but the output bam
AAACCCACACACACTA.tagged.bam
is empty (17kb only with headers) My input bamAligned.sortedByCoord.out.bam
is around 5Mb. I did not encounter any error or warning messages.Could you please give me a clue why this happened?