wwood / galah

More scalable dereplication for metagenome assembled genomes
GNU General Public License v3.0
48 stars 11 forks source link

Implement skani as fastani alternative #30

Closed AroneyS closed 1 year ago

AroneyS commented 1 year ago

Add skani as fastani alternative

AroneyS commented 1 year ago

Few steps closer. Still need to add to cli. Also may be faster to let skani handle the fileio in bulk, like in https://github.com/bluenote-1577/skani-lib-example/blob/main/src/main.rs. But then we would need independent logic for skani/fastani, since it would calculate all pairs up-front. Could add it to the initialise method for skani and then reference a look-up table in calculate_ani...

wwood commented 1 year ago

Are you thinking per-disconnected component after the preclusterer? Or just in total? If the latter then no point in preclustering, I think.

AroneyS commented 1 year ago

Not sure. Both are possibilities, though skani doesn't recommend comparing genomes with <82% ANI, so we would have to deal with that if we skip preclustering, right? Though it says "If the resulting aligned fraction for the two genomes is < 15%, no output is given.", so maybe <82% just doesn't give an answer, rather than giving an unreliable answer.

AroneyS commented 1 year ago

Also, I get this warning on compile: warning: the following packages contain code that will be rejected by a future version of Rust: buf_redux v0.8.4, partitions v0.2.4

AroneyS commented 1 year ago

--cluster-method isn't in the --full-help. Is there somewhere that I missed? Or do I have to rebuild docs?

wwood commented 1 year ago

I didn't go through every line, but seems about good. I think you need to add skani to the conda yml, and can you enable runs on PR using

on: [push, pull_request]

in the actions yml please?

wwood commented 1 year ago

--cluster-method isn't in the --full-help. Is there somewhere that I missed? Or do I have to rebuild docs?

You added that argument, so won't show up until docs are redployed from main/release.