Closed AngieHinrichs closed 5 months ago
@AngieHinrichs
Hi Angie,
Does the presence of tree.json in a dataset always mean that clades can be assigned?
We released 3.6.0 just earlier today where clades become optional even if the tree is present. And previously our folks used empty string in place of clade_membership
tree field as a workaround if clades are missing from the tree for one reason or the other (most of the times this is due to unclear nomenclature, or lack of time).
Currently I'd say downloading the tree and looking if there's at least one .node_attrs.clade_membership
in it is a safe bet.
In the official datasets in the data repo, when rebuilding the dataset index, we could enumerate datasets "capabilities". I have some basics emitted into the index.json of the dataset server, but not clade assignment. Might be a good addition.
Do you have any other such capabilities in mind that we could add? I am having difficulties imagining how that would look from the user perspective, as me myself I don't use Nextclade often :)
Once we have a list of capabilities in the index, the --json
flag to the dataset list
command should show it like it appears in the index. Then the list can be pretty-printed in CLI and rendered in Web in some way. Any preferences here?
We should also not forget about clade-like attributes which may also be present on the tree in .meta.extensions.nextclade.clade_node_attrs
, e.g. lineages in SC2 trees.
The tree-related capabilities could be computed in the rebuild script somewhere around here, I guess https://github.com/nextstrain/nextclade_data/blob/403e2574654daacc40b0face461965da41e953d2/scripts/rebuild#L43-L45
The tree-related capabilities could be computed in the rebuild script somewhere around here, I guess https://github.com/nextstrain/nextclade_data/blob/403e2574654daacc40b0face461965da41e953d2/scripts/rebuild#L43-L45
Yes, if you could add "clades"
there like you add "customClades"
, and include the capabilities in the cli list output, that would be great! At the moment, clades are what I'm keen to see, but I would not mind seeing other special capabilities listed.
Released in 3.7.0
Fantastic, thanks! The types and counts are really helpful!
In the output of
nextclade dataset list
it would be very helpful to have an indication of whether clades can be assigned using each dataset. For example, dataset nextstrain/flu/h3n2/ha/EPI1857216 can assign clades, but nextstrain/flu/h3n2/pb1 cannot (it has no tree.json). Currently, in order to determine that, I need to download each dataset and look for tree.json.Does the presence of tree.json in a dataset always mean that clades can be assigned? If so, then hopefully it would be straightforward for
nextclade dataset list
to report whether pathogen.json includes treeJson.