nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
535 stars 65 forks source link

Request: Optional RG tag in FASTQ #981

Open Mon3trK opened 3 months ago

Mon3trK commented 3 months ago

Hi devlopers of Dorado,

First, thank you for providing this great software.

I noticed that after issue #532 , RG tag and some other tag is automatically added to the FASTQ header, which I think is a bit irrational for

  1. The tag, especially RG tag is long, which will take up more disk space and bad for device like Mk1C with increased IO burden. If I want the full info to be well documented, BAM is absolutely better.
  2. If I basecalled my own data with --emit-fastq (which is not recommend), I would know the model in Stdout or I manually set it, so extra recording won't help.
  3. If I upload to SRA database to share my data, SRA will re-encode my header after fastq-dump, and the original long header will be useless.

    I fully recognize the previous demands like minimap2 -y, but I think that is minor for the vast majority, since dorado can do the alignment. If I want fastq format, I will want the header to be neat and fast, so I suggest maybe make this function to be optional or leave this function to other 3rd party software.

HalfPhoton commented 3 months ago

I'll bring this up, we'll discuss this change and I'll get back to you.

Thanks, Rich

Mon3trK commented 3 months ago

Hi @HalfPhoton Really appreciate your attention!