nextstrain / seasonal-cov

Nextstrain build for seasonal coronaviruses
https://nextstrain.org/seasonal-cov/
1 stars 0 forks source link

phylogenetic build fails because of missing nextalign #2

Closed genehack closed 5 months ago

genehack commented 5 months ago

Current Behavior

# from repo root
nextstrain build ./ingest
# lots of output, things work

nextstrain build ./phylogenetic
# lots of output, things don't work; first error: 

/bin/bash: line 1: nextalign: command not found

Possible solution

Based on the archived repo, nextalign was moved into nextclade -- but the page linked for nextalign-cli 404s.

I'm guessing the right answer here is to update the Snakemake file to either replace the nextalign call with nextclade run with some set of options, or (looking at the zika repo) covert things over to using augur for the alignment?

@kimandrews any insight you can provide would be appreciated!

kimandrews commented 5 months ago

I used augur align for whole genome alignment in the measles phylogenetic workflow, whereas I used nextclade run for aligning the shorter N450 region

victorlin commented 5 months ago

The context here is that nextalign was bundled with Nextclade in v2 and removed in v3, which is probably the version you have. This pathogen repo seems to be written for Nextclade v2 though that dependency isn't stated anywhere.

I'm assuming nextalign was chosen for this repo intentionally, so potential fixes would be to (1) mention the Nextclade v2 dependency explicitly and set your environment up with that or (2) migrate to Nextclade v3 by using nextclade run as you've mentioned. (2) is probably the best move.

joverlee521 commented 5 months ago

Ah, the workflow was created before Nextclade v3 was released.

I think we'd want to migrate it to nextclade3 run following Nextclade's migration guide.

genehack commented 5 months ago

I think we'd want to migrate it to nextclade3 run following Nextclade's migration guide.

Thanks! I'll check out that guide.

ivan-aksamentov commented 5 months ago

We should probably also document the Nextalign-like usage in the main Nextclade docs, i.e. using Nextclade v3 without a dataset and providing individual files using --input-* args instead. The invocation of Nextclade v3 with individual args is mostly the same or is very similar to what Nextalign v2 used to be. And I believe that swapping nextclade in place of nextalign executables should produce somewhat informative errors.

Documenting it better would allow for smoother transition for v2 users and also highlight that Nextclade v3 can be used as an aligner even where there's no dataset for a particular organism.

Upd: I created an issue: https://github.com/nextstrain/nextclade/issues/1456