sanger-pathogens / ariba

Antimicrobial Resistance Identification By Assembly
http://sanger-pathogens.github.io/ariba/
Other
167 stars 52 forks source link

Long read compatibility by tweaking the pipeline #332

Open varun8476 opened 1 year ago

varun8476 commented 1 year ago

Hi, Can we make ARIBA compatible with long reads by changing the mapping and assembly approach? I am planning to do this as my masters thesis project. I am a bioinformatics student and my first hunch is to use minimap2 for mapping the reads to the cluster and using any long read assembler such as Flye or Miniasm for assembling the reads.
Any leads as to whether this approach is feasible or pointing out any research done related to this would be helpful. Thanks in advance.

martinghunt commented 1 year ago

This hasn't been tried. At the time ARIBA was made, long read assemblies were too low quality (in particular, indel errors), which would have led to too many errors. ARIBA is made to be quite conservative, which is fine for Illumina but not for data with a higher error rate.

But I'm not sure it's worth trying because these days long reads and their assemblies are significantly better now (although I'd still be wary of indel errors). If it was me, I would assemble all the reads (using flye/unicycler/whatever works) and then use arbitamr for the amr predictions: https://github.com/MDU-PHL/abritamr

Sorry if that sounds too negative, but realistically I expect that would be the best method. Happy to be proven wrong! That said, if you really want to do it then this is what I can think of that will need changing, and there's probably more that I haven't thought of. Basically, there's a bunch of places where read pairs are assumed, and it'll be a fair bit of work to deal with going from paired to unpaired: