ruqianl / appendCB

Append a character to the CB tag of each read record in the provided BAM file.
1 stars 0 forks source link

Empty output bam file #1

Closed Raina-M closed 2 years ago

Raina-M commented 2 years ago

Hello,

I ran the following command:

appendCB/src/appendCB --append AAACCCACACACACTA Aligned.sortedByCoord.out.bam AAACCCACACACACTA.tagged.bam

but the output bam AAACCCACACACACTA.tagged.bam is empty (17kb only with headers) My input bam Aligned.sortedByCoord.out.bam is around 5Mb. I did not encounter any error or warning messages.

Could you please give me a clue why this happened?

ruqianl commented 2 years ago

Hi Meng,could you check jf the barcode tag is CB in your bam?-------- Original message --------From: Meng Zhang @.>Date: Sat, Jul 2, 2022, 03:10To: ruqianl/appendCB @.>Cc: Subscribed @.***>Subject: [ruqianl/appendCB] Empty output bam file (Issue #1) Hello, I ran the following command: appendCB/src/appendCB --append AAACCCACACACACTA Aligned.sortedByCoord.out.bam AAACCCACACACACTA.tagged.bam

but the output bam AAACCCACACACACTA.tagged.bam is empty (17kb only with headers) My input bam Aligned.sortedByCoord.out.bam is around 5Mb. I did not encounter any error or warning messages. Could you please give me a clue why this happened?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

Raina-M commented 2 years ago

No, I am afraid not. I have a merged BAM file from all single cells generated from cellranger, but I cannot use this BAM as input of sgcocaller (I supposed you know this tool, also from your group) since this BAM contains cells from two species. So I split the BAM to thousands of small bam files. Each bam corresponds to a single cell. And I only used bam files from one species. Considering some alignment errors, I converted the bam to fastq file and re-aligned to reference genome. I guess I lost the barcode information during the realignment step.

ruqianl commented 2 years ago

Hi @Raina-M ,

If you have the list of cell barcodes from each species known, you wouldn't need to split the BAM into small bams. Say you have a barcodeFileS1.txt that contains the list of CB from species 1, and you can use that as the input for sgcocaller. sgcocaller will only look at DNA reads from these cell barcodes. The same for the second species.

Best, RL

Raina-M commented 2 years ago

If you have the list of cell barcodes from each species known

This is exactly the problem I am trapped in. I know the CB of species 1 from cellranger outputs, which derived clusters based on gene expression. So I selected the cluster with species 1 genes and extracted a list of CBs belonging to this cluster. Then I pulled the alignments with these CBs from a single big BAM. For now, everything looks perfect. I should have got a BAM of species 1 alignments.

However, when I examined this BAM file, I found some mapping targets were species 2 due to the similarity between these two species. That was why I converted BAM to fastq and then realigned only to species 1 genome.

you wouldn't need to split the BAM into small bams

You are right, for scgocaller, I do not need to do this. Splitting to single-cell BAM is because we need some single-cell analysis later. But this is not relevant to my current problem. Even not doing splitting, I still lose the CB information in species 1 BAM.

Since I already generated the BAM for species 1, I am thinking of using appendCB to add barcodes, but I do not why it only output an empty BAM.

ruqianl commented 2 years ago

Hi @Raina-M ,

When you converted the BAM to fastq to do re-mapping, you probably lost the CB tags to all the reads? Perhaps try using options from samtools fastq that lift the tags to the sequence names

-t
Copy RG, BC and QT tags to the FASTQ header line, if they exist.

-T TAGLIST
Specify a comma-separated list of tags to copy to the FASTQ header line, if they exist.

Again, I think it is probably the CB tags are missing from the reads in the BAM so appendCB is not generating any reads. I think you could try the --generate option that generates CB tags for reads without a CB tag already using the provided string via the --append option.

Hope it is helpful. I have a new version of the appendCB tool which should be a bit easier to work with, and will share it very soon.

Raina-M commented 2 years ago

Thank you for the options, but I am afraid that -t or -T flag in samtools fastq did not work in my case. Even though I added CB in fastq headers, the aligner still removed this information after mapping, and there is no rescue option to keep it.

Anyway, I already manually added CB attributes to the bam since I have one bam for one cell, which means the barcode is the same for one bam.

samtools view --no-header mybam.bam | awk -v cb=$CB '{print $0"\tCB:Z:"cb}' > CB_tagged.sam

Then I just added the bam header to CB_tagged.sam and converted it to bam. I think this should fulfill the function of appendCB, right? Or should I rerun appendCB on my output?

ruqianl commented 2 years ago

Hi Meng,Since you now have a CB tag in each of your single cell bams, you can merge them together into a merged bam and run sgcocaller on all cells together.BTW minimap2 has an option to copy over the comments in the fastq read header to the aligned reads.-------- Original message --------From: Meng Zhang @.>Date: Tue, Jul 19, 2022, 23:46To: ruqianl/appendCB @.>Cc: ruqianl @.>, Comment @.>Subject: Re: [ruqianl/appendCB] Empty output bam file (Issue #1) Thank you for the options, but I am afraid that -t or -T flag in samtools fastq did not work in my case. Even though I added CB in fastq headers, the aligner still removed this information after mapping, and there is no rescue option to keep it. Anyway, I already manually added CB attributes to the bam since I have one bam for one cell, which means the barcode is the same for one bam. samtools view --no-header mybam.bam | awk -v cb=$CB '{print $0"\tCB:Z:"cb}' > CB_tagged.sam

Then I just added the bam header to CB_tagged.sam and converted it to bam. I think this should fulfill the function of appendCB, right? Or should I rerun appendCB on my output?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Raina-M commented 2 years ago

Thanks for the reply. I think I can move on to sgcocaller with the tagged BAM now. As for mininmap2, I do not think it is an optimal aligner for scRNA-seq data. Anyway, I still encountered some errors when using this merged BMA for sgcocaller, but I will post an issue on sgcocaller GitHub page.