nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

Support custom color and description files on Terra runs #872

Closed huddlej closed 2 years ago

huddlej commented 2 years ago

Current Behavior

Custom color and description files used for ncov builds are expected to exist as local files (e.g., in nextstrain_profiles/, etc.) based on the paths in the builds YAML file. However, these files are not part of the ncov workflow itself (and shouldn't be), so they do not exist on local storage when running the workflow on Terra.

Expected behavior

We should be able to support custom files like colors, includes, etc. that are stored in remote paths or in a separate GitHub repository.

Possible solutions

j23414 commented 2 years ago

I'm still thinking, but for option 3 the implementation would look like:

mv ~{localized_dir} ncov/custom_profile

And the build.yaml would use local paths, like colors: custom_profile/colors/algeria_colors.tsv and assume it'll be moved into the ncov directory. Yeah, this can get messy fast. I'll need to think.

huddlej commented 2 years ago

After discussing this issue today as a group, we agreed that the solution to this issue should include a solution to the broader issue of how to allow users to bring their own "pile of files" to the ncov workflow. We discussed two related approaches to handle this problem:

  1. Place ncov inside an outer directory that includes the user's data and config files. This approach emphasizes that ncov is a piece of software, but it doesn't work yet with the ncov workflow.
  2. Place the user's data and config files inside a subdirectory of ncov. This is one approach we've used within the group already and one that works with the existing ncov workflow. This approach relies more on the existence of ncov as a Git repo that you work from instead of a piece of software you use, though.

Regardless of the nesting pattern we decide above, we discussed allowing users to define a path to a zip file as an optional input to the ncov WDL workflow. The workflow would then fetch the user's zip file and extract it in the appropriate place before running the workflow. This solution would support linking to GitHub branches/releases/etc through GitHub's auto-generated zip files, but it would also support manually created zip files that were uploaded to Google Storage through Terra.

This solution should address the specific color and description files issues here without requiring users to modify their builds config to include remote paths, etc. It should also support any other additional files users would want to reference for their build (e.g., custom rules, markdown files, etc.).

j23414 commented 2 years ago

Sorry it took me a while. Implemented solution 2 and tested it on the Chad dataset: https://nextstrain.org/staging/ncov_chad .

If the run successfully completes, the result is deployed to a url similar to:

Additional columns in the input Tables/builds allows for each location to:

Happy to set up a meeting if that works better. Otherwise, I figure you're busy this week so feel free to explore. Suggestions welcome.

huddlej commented 2 years ago

This will be closed by the wdl/optionals branch.

j23414 commented 2 years ago

Closed by merging the the wdl/optionals branch.