nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
391 stars 73 forks source link

Preciser description of all terms used to describe medaka models #376

Closed BCArg closed 8 months ago

BCArg commented 2 years ago

Medaka is a Research Release.

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

We provide no guarantees that feature requests will be implemented. They may be implemented if sufficient interest is generated or if the request aligns with research objectives.

Is your feature request related to a problem? Please describe. Would be nice to have a clear description/ table of what all the terms used to describe the models are. I know from the README page that 'hac' is high accuracy, though I see that there are new models at v1.6.1, and the new term used sup (e.g. r104_e81_sup_g5015_model.tar.gz). Intuitively I guess this refers to the super high accuracy basecaller, though it would be nice to be 100% sure about that

Describe the solution you'd like

  1. expand the 'Models' section on the README page, with a table of all used terms e.g. r104, min, fast. sup etc. and what they mean
  2. potentially add this extra information/ table to be printed when medaka tools list\_models is called

Describe alternatives you've considered NA

Additional context NA

cjw85 commented 8 months ago

We won't be doing this, it is one of my pet hates that they is any meta information in the model names to start with. There are various pieces of meta information that have been added to some model names that are of no consequence to users.

Fun fact: to this day I actually don't know that "sup" means "super" --- I've just grown to assume that, no one within Nanopore has ever refered to it as "super accuracy" in my presence! 🤣

v1.11.0 of medaka can introspect its input files to search for basecaller model identifiers. To a large extent this negates any need to have any understanding what-so-ever of the meaning of the various components of the model names.