Open joverlee521 opened 1 year ago
I was wondering why the "Unknown" value wasn't automatically ignored by Auspice and found this was done on purpose!
Thanks Jover!
Since the beginning, we've been trying to carefully ignore the country and other values, because Nextclade inputs and, therefore, newly attached tree nodes simply don't have this information.
My understanding is that in our own datasets we don't have these fields, and so these colorings do not even appear in the dropdown.
I had to add the handling of unknown values long time ago for the reasons I don't remember clearly, but I found this Slack thread: https://bedfordlab.slack.com/archives/C7SDVPBLZ/p1596559922285700?thread_ts=1596488014.283400&cid=C7SDVPBLZ
But maybe it needs to be revisited.
You know more about this things. What can you recommend?
Ah, the grey scale is the intended behavior in Auspice for values that are not defined in the color scale.
Maybe the easiest thing to do here would be for Nextclade to use "Unknown"
(without the space at the end) so that the new tree nodes just don't have a country value? This way the new nodes will automatically get colored as grey for these colorings and you can remove the injection of the color scale.
We have never been very careful about metadata other than clades on these trees. But many datasets 'inherit' these metadata attributes from the workflows they are based on.
I don't think I fully understand the problem yet though. If I look at these trees in auspice (outside of nextclade) they get assigned the default rainbow colors. Is it that only after adding the fake Unknown_
auspice then generates the grey values?
I don't remember all the reasons why we wanted to avoid auspice interpreting this as Unknown
. But I guess we can remove the space and see whether we get any unwanted behavior.
Is it that only after adding the fake Unknown_ auspice then generates the grey values?
Yes, Nextclade injects a single color scale for the added Unknown_
value and Auspice will "create shades of grey for values in the tree which weren't defined in the provided scale".
Using the Influenza H1N1pdm HA dataset as an example, the "Region" color-by looks fine because tree.json includes a scale for region. However, the "Country" color-by is grey scale because the tree.json does not define a scale for country.
It's not clear to me if this should be considered a bug or user-error but at least wanted to document the behavior here.
If the reference tree for a Nextclade dataset does not pre-define the color scale for region, country, or division fields, then coloring becomes grey scale due to the injection of the "Unknown" value scale
https://github.com/nextstrain/nextclade/blob/aaac7ce891ce1a83b6554097a0a075004e770282/packages_rs/nextclade/src/tree/tree_preprocess.rs#L282-L283
This was first reported in Nextstrain office hours on 2023-09-21 by an external user who built their own Nextclade datasets, but can be observed in the "official" Influenza A H1N1pdm HA dataset which does not define the color scale for "country":