DepledgeLab commented 5 months ago

Issue Report

For RNA004 datasets, the adapter is around ~70nt. When aligning fastq datasets derived from Dorado with the --no-trim option, I would thus expect a large proportion of resulting alignments to have soft-clipping values of between ~50-90 at the 3' end. However, in reality I often see around 30-40% of alignments have no soft-clipping at the 3' end. Inspection of the individual reads/alignments shows this is not an alignment error and that there is simply no adapter sequence present in the read.

This raises the question of whether Dorado is still performing some level of trimming when the --no-trim flag is set or whether there is another explanation for why a decent proportion of the basecalled reads do not have any adapter sequence present?

Run environment:

Dorado version: 0.6.0.
Dorado command: dorado basecaller --no-trim -r $MODEL/rna004_130bps_hac@v3.0.1 $IN/pod5/ > $OUT.bam
Operating system: Centos 7
Hardware: HPC cluster
Source data type: pod5
Source data location (on device or networked drive - NFS, etc.):
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): RNA004, >2 million reads
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

No warnings/errors are reported in the log files

malton-ont commented 5 months ago

Hi @DepledgeLab,

Have these adapterless reads been split? You can check whether the pi:Z tag is set to determine this. If so, this issue may be related.

DepledgeLab commented 4 months ago

Thanks. I'm going to go ahead and close this now as the results observed appear to have come from unexpected alignment errors.

nanoporetech / dorado

Inconsistencies in '--no-trim' results #775

Issue Report

Run environment:

Logs