nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Choosing the right model #380

Closed pbelmann closed 1 year ago

pbelmann commented 1 year ago

Hi,

thank you for developing this tool! I would like to use medaka for polishing a dataset where I only have information about the device. Would it still make sense to use medeka with a model randomly chosen just based on the device used?

cjw85 commented 1 year ago

Hi @pbelmann,

From where has your data come? It might be the case that the fastq headers contain useful information that can help us advise you. Failing that, if you know an approximate time perioid when sequencing/basecalling was performed we can take a best guess assuming the most recent Guppy version was used at the time.

pbelmann commented 1 year ago

Hi @cjw85,

thank you for your fast reply!

From where has your data come?

The idea is actually to process datasets that are available on SRA. So my question is not bound to a specific dataset but is a more general question regarding datasets where I do not have the information needed to run medaka. In most cases I really just get the used "instrument" from the SRA metadata. Example Dataset: http://ftp.era.ebi.ac.uk/vol1/fastq/ERR499/008/ERR4994318/ERR4994318.fastq.gz

It might be the case that the fastq headers contain useful information that can help us advise you.

Do you mean that the device or the basecaller version is encoded in the header?

Failing that, if you know an approximate time perioid when sequencing/basecalling was performed we can take a best guess assuming the most recent Guppy version was used at the time.

Based on the SRA 'release date' attribute I could indeed get the rough time period. So you would suggest to check which guppy version was the most recent one? Where would I get this information?

cjw85 commented 1 year ago

The basecaller version and model is encoded in the fastq headers for more recent data, older datasets will not have this.

The guppy CHANGELOG contains release dates. I don't know if this is bundled in the Guppy distribution. Its available in the Nanopore Community (sorry, needs a sign in) https://community.nanoporetech.com/downloads/guppy/release_notes