nanoporetech / medaka

Sequence correction provided by ONT Research
https://nanoporetech.com
Other
420 stars 74 forks source link

Training Medaka using poorly assembled Illumina genome #99

Closed 000generic closed 5 years ago

000generic commented 5 years ago

Hi!

I'm interested in running Katuali - including training Medaka - for de novo ONT assembly of pygmy cephalopod species.

In regards to Medaka - for pygmy squid I have an Illumina assembly but the N50 is slightly <20 kb - while the genome is estimated at 2.2 Gb. Do you think an Illumina assembly of such poor quality would still enable Medaka training?

Average read coverage for PromethION reads is ~60x. However, many of the ONT reads are under 10 kb (read N50 ~12 kb), so the small size / low N50 of the ONT reads might actually do well with the poorly assembled small contigs of the Illumina genome for training Medaka? The idea here being that small reads can potentially align for much of their length despite the small contigs - and so there could still be enough signal for training. But I'm not sure if this is relevant to Medaka.

Are there any issues surrounding use of PromethION with Medaka?

Thank-you, Eric

cjw85 commented 5 years ago

The short lengths of the contigs is not an issue per se for medaka training, providing alignment of the Illumina contigs to a draft PromethION assembly is robust. We generally train medaka with less than 200Mb of unique genomic sequence; from your dataset it sounds like you could actually end up with more usable training data than this.

60X is a reasonable depth to use, and there are no issues around using PromethION data.

000generic commented 5 years ago

Great! That all sounds promising - will give it a go and see how it goes for Medaka and for Katuali once we get everything installed.

Thank you :)