nextstrain / avian-flu

Nextstrain build for avian influenza viruses
http://nextstrain.org/avian-flu
13 stars 6 forks source link

Annotate H5 clades through a node data JSON file instead of modifying metadata #25

Open huddlej opened 1 month ago

huddlej commented 1 month ago

Context

In conversation about #22, @trvrb noted:

I don't think this is part of the scope of this PR, but it would seem cleaner to me for this clade-labeling/add-clades.py script to instead just create a node data JSON with h5_label_clade rather than messing with the metadata file.

@lmoncla and I just had some confusion from different rules (refine, traits) asking for the metadata TSV vs the metadata-with-clade TSV. Though the function metadata_by_wildcards mostly solves this issue.

@jameshadfield noted that:

The only reason I can see to not do this is if we use this data in the filtering step. But we don't.

Description

We should modify scripts/add-clades.py to create a node data JSON file as output and update the workflow to make the resulting output an input to the export rule instead of a step that modifies the metadata.