Closed penguinmeow closed 5 months ago
Hi Zhen,
I ran into the same error as you trying to use the paired-end workaround that you mentioned. It seems that the problem is with estimating max read-length. By specifying max read-length (-l parameter) in the count dunk, I was able to get past this issue. The same is true for all Alleyoop modules that require max-read length estimation (utrrates, snpeval, tcperreadpos, tcperutrpos). Maybe this will help you as well.
Best regards, Lasse
Hi @penguinmeow
sorry for the slow response. I think you can first try it with paired-end alignments. For this I guess you need to get rid of the -l
and -n 100
parameter and try again.
If you still get errors afterwards, I would then go for merging both read sets and consider them as single-end reads.
The -l
parameter for the slamdunk modules you only need once you have a properly aligned bam but from what I can see you already failed at the mapping stage right?
Hi @penguinmeow
sorry for the slow response. I think you can first try it with paired-end alignments. For this I guess you need to get rid of the
-l
and-n 100
parameter and try again.If you still get errors afterwards, I would then go for merging both read sets and consider them as single-end reads.
The
-l
parameter for the slamdunk modules you only need once you have a properly aligned bam but from what I can see you already failed at the mapping stage right?
Hi Tobias, Thanks for your suggestion and sorry for the late response. Yes, previously, I mapped R1 and R2 to reference genome with the paired-end mode of NextGenMap and use SLAMdunk filter, snp, count modules for further analysis. And it turn's out to have the KeyError: "tag 'XA' not present" in count module. To get rid of the error, I followed the instruction in the issue 25, in which I add -l and -n 100 in the ngm parameter, which turns out to give an error of "TopN > 1 is currently not supported for paired end reads."
I remove the -l and -n 100 in the ngm parameters and re-run the whole pipeline of slamdunk today, but the same error that KeyError: "tag 'XA' not present" occurs again in the count module. The script and err log are attached in case you might wanna take a look. pipe_script.txt pipeline_err.txt
Thank you so much and wish you a happy Christmas~ Zhen
Hi Zhen,
sorry for not being clear - the -l
should only be removed for the ngm
call - for the actual slamdunk commands you should supply the read-length with -l readlength
to avoid the length estimation step which uses the not-present XA
tag. You follow me?
Happy xmas to you to!
This is kind of related, so I decided to continue in this thread, it may be moved to a new thread of deemed off-topic.
In the paired-end data, there is a problem of potential T->C conversions within the overlaps of read pairs. Such hits would be counted twice, even though they should not. One way to overcome the problem is to find such mates, and set qualities of any of the mates in the BAM file to zero (or < than the threshold used for counting, so that they are not counted) - the trick used for bisulfite sequencing.
I am also toying with another, a bit less elegant approach, which is to merge overlapping pairs to obtain a single long read for each of them, and then do the analysis as if it was SE data. The problem is that the reads will be of different lengths, so theoretically, the counts should be weighted to take these differences into account (a longer read is more likely to be converted; this can get nuanced even more, as conversion probability depends on T content of a read).
But leaving the nuances aside, I would love to hear from you about how to tackle the problem of overlapping pairs.
@t-neumann Does slamdunk provide a way to tag converted reads in a BAM file? Having such a read tag would be another way to tackle the described problem, as you could manually set qualities of any mate of converted reads to zero. This would effectively lead to counting converted fragments, which is not perfect (different lengths, etc), but maybe would be good enough.
I am really curious what do you think? Also, if wrote anything stupid, don't hesitate to correct me! :)
Hi Tobias,
Could you kindly help me with the following 2 questions?
I followed your suggestion in #25
And run the whole pipeline like this:
`#!/bin/bash module load conda3/5.1.0 source activate slamseq
for 1953895
map
ngm -b -r GRCm38.p6.genome.fa -1 pipe/1953895_WT_BMDM_1_S110_L004_R1_001.fastq.gz -2 pipe/1953895_WT_BMDM_1_S110_L004_R2_001.fastq.gz -t 16 --slam-seq 2 -5 12 -l --rg-id 1953895 --rg-sm 1953895_WT_BMDM_201120 -n 100 --strata -o pipe/1953895_WT_BMDM_1_S110_L004_R1_001.bam
filter
slamdunk filter -o pipe -b UTR/3pUTR.bed -t 1 pipe/1953895_WT_BMDM_1_S110_L004_R1_001.bam
snp
slamdunk snp -o pipe -r GRCm38.p6.genome.fa -t 16 pipe/1953895_WT_BMDM_1_S110_L004_R1_001_filtered.bam
count
slamdunk count -o pipe -s pipe -v 1953895_WT_BMDM_1_S110_L004_R1_001_filtered_snp.vcf -r GRCm38.p6.genome.fa -b UTR/3pUTR.bed -t 1 pipe/1953895_WT_BMDM_1_S110_L004_R1_001_filtered.bam` But error occurs:
Seems like we cannot run ngm with the same parameter set for paired-end fastqs. Is there another way to fix the "tag 'XA' not present" error in Count?
Thank you so much! And happy weekend~
Regards, Zhen