I am fond of using dorado, however, recently I encountered an issue I'd like to report.
I have datasets basecalled and demultiplexed with dorado 0.5.3. The basecalling was performed without trimming, to estimate polya tails. I wanted to use the resulting data for other analyses, which require adapter trimming. I read the manual and did the following:
first launched the samtools fastq command, to extract fastq reads from my bam files. Then I used dorado trim as described in "Trimming existing datasets". I thought that since I am using fastq as an input, the output should also be in the same format.
So I used the following command:
dorado trim in.fastq > out.fastq
I expected that since it accepts either fastq or bam, it could output either fastq or bam, depending on the input.
And this produced a file which looked like bam format. I ran it on another fastq, without the "pt" tag, because I thought that maybe it recognized my input as bam file because of those tags which I wanted to have in the headers for polyA analysis. Again, the output was resembling bam not fastq.
After all, I achieved what I wanted, and ran:
dorado trim in.fastq > out.bam
then obtained fastq from it with samtools fastq. However, this is quite bothersome and results in intermediate bam files. For me it was also non-intuitive that the output always would be in bam format, no matter what was the input format. Would it be possible to clarify this in the manual? Also, would it be possible to output the results in fastq instead of producing intermediate bam?
Best,
N.
Run environment:
Dorado version: dorado 0.5.3
Dorado command: dorado trim in.fastq > out.fastq
Operating system: Ubuntu 22.04.3 LTS
Hardware (CPUs, Memory, GPUs):
Source data type: cDNA sequencing data with SQK-PCS111 kit on R 9.4.1 flow cell, basecalled with dorado, initially without trimming to determine poly(A) lengths; then demultiplexed with dorado & converted to fastq format with samtools fastq
(samtools commands: samtools fastq -t -T "pt" -@ 10 in.bam > out.fastq/samtools fastq -t -@ 10 in.bam > out.fastq) and trimmed with dorado trim
Hello,
I am fond of using dorado, however, recently I encountered an issue I'd like to report.
I have datasets basecalled and demultiplexed with dorado 0.5.3. The basecalling was performed without trimming, to estimate polya tails. I wanted to use the resulting data for other analyses, which require adapter trimming. I read the manual and did the following:
first launched the samtools fastq command, to extract fastq reads from my bam files. Then I used dorado trim as described in "Trimming existing datasets". I thought that since I am using fastq as an input, the output should also be in the same format. So I used the following command:
dorado trim in.fastq > out.fastq
I expected that since it accepts either fastq or bam, it could output either fastq or bam, depending on the input. And this produced a file which looked like bam format. I ran it on another fastq, without the "pt" tag, because I thought that maybe it recognized my input as bam file because of those tags which I wanted to have in the headers for polyA analysis. Again, the output was resembling bam not fastq.
After all, I achieved what I wanted, and ran:
dorado trim in.fastq > out.bam
then obtained fastq from it with samtools fastq. However, this is quite bothersome and results in intermediate bam files. For me it was also non-intuitive that the output always would be in bam format, no matter what was the input format. Would it be possible to clarify this in the manual? Also, would it be possible to output the results in fastq instead of producing intermediate bam?
Best, N.
Run environment:
Input fastq (glimpse):
Output "fastq" - produced by dorado trim (glimpse):