sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
302 stars 189 forks source link

Parameters for Nanopore (ONT) long-reads sequence data #590

Open m-ocejo opened 1 year ago

m-ocejo commented 1 year ago

I would appreciate any suggestions to adapt the default parameters for ONT (Oxford Nanopore) long-read sequence data. Running prokka and roary with default parameters returns almost 3x the genes (pangenome) observed in Illumina data. The calculated core genome, however, is much lower in nanopore data. Since error rates in ONT reads are higher than Illumina, will lowering the parameter -i (minimum percentage identity for blastp [95]) "improve" the results? And what %ID would be the safe minimum value? We were somehow surprised finding such low number of core genes since the isolates are closely related.

Thanks for maintaining this amazing tool! All the best,

Medelin