nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Question about FASTQ output with dorado duplex #954

Closed gambalab closed 1 month ago

gambalab commented 1 month ago

Dear Dorado Team,

Thank you for developing the valuable Dorado tool.

I have a question regarding generating FASTQ files using the --emit-fastq parameter with the dorado duplex command. I understand that FASTQ format doesn't include the dx tag, which identifies reads originating from duplex or simplex calls. However, I have specific concerns:

  1. Read Types in FASTQ Output: Does the FASTQ file generated by dorado duplex contain all three read types: (i) Duplex reads (ii) Simplex reads without duplex offspring (iii) Simplex reads with duplex offspring? Or it only contain just the first two types (duplex and simplex without duplex offspring)? his last behavior would be ideal since simplex reads that had a duplex offspring are just a duplicated read with more errors and could negatively affect variant calling accuracy.

  2. Adapter Trimming for Simplex Reads: Since dorado duplex reportedly doesn't trim adapters in simplex reads, would it be appropriate to run dorado trim on the resulting FASTQ files for adapter removal before alignment?

Thank you for your clarification. I look forward to your response.

HalfPhoton commented 1 month ago

Hi @gambalab,

Q1. Yes - the FASTQ output will have all 3 "read types". I'm not sure if there's any way to find which reads are which other than;

  1. "new" duplex and split simplex reads will have new UUID read ids which don't exist in the input data
  2. Duplex read IDs do have concatenated IDs currently "parentID_a;parentID_b" but this is being deprecated as it causes issues with VCF files.

Q2 - The adapters that are not trimmed in simplex basecalling are rapid adapters.

Best regards, Rich

gambalab commented 1 month ago

thank for your prompt answer. So, regarding Q2, is it correct to use dorado trim command?

Or wold be more correct use dorado trim --no-trim-primers that should trim only adapters in this way?

HalfPhoton commented 1 month ago

I believe the standard basecall settings should be sufficient and no additional trimming is needed.