nextstrain / auspice

Web app for visualizing pathogen evolution
https://docs.nextstrain.org/projects/auspice/
GNU Affero General Public License v3.0
291 stars 162 forks source link

Change ordering of subplots by field or calculation across fields #1823

Open huddlej opened 4 weeks ago

huddlej commented 4 weeks ago

Description

Originally discussed in Slack and in GitHub comments, a common use case is the need to order rows of the measurements panel by a given metadata field. For example, we order titer measurements for seasonal flu by the clade of the reference strain and the minimum y-axis position of that clade's nodes in the phylogeny and then export that order explicitly in a measurements config JSON. A simpler version of this kind of ordering would be either:

  1. an option in the measurements config JSON to order a grouping by one or more specific fields in the given data frame instead of specifying a list of values in order. The following example (with made-up new order_by key) would order the reference strains by their clades alphabetically.
"groupings": [
    {
        "key": "reference_strain",
        "order_by": ["clade_reference"]
    }
]

This option would require Auspice to know how to interpret this proposed new ordering field and sort the groupings at load time. Or, we could modify augur measurements export to know how to interpret this new ordering field and dynamically produce a corresponding order field with the list of values that Auspice currently expects. For the seasonal flu example above, we would annotate the y-axis positions of each clade in the data frame prior to passing it to augur measurements export. That specific use case raises the question of what types to infer for each ordering field prior to the sorting. Vega deals with this by explicitly labeling field types as quantitative (y_axis_position:Q) or nominal (clade_reference:N).

  1. a new argument in the augur measurements export interface like --order-groupings-by that takes one or more arguments like --order-groupings-by clade_reference. This option could internally sort groupings by the given field(s) and export a measurements JSON with an order field for the groupings using the current implementation where the field contains the explicit list of values in order. This option would not require any changes in Auspice. The proposed argument here would actually need to be more complex, because users need a way to specify which grouping field they want ordered by which field(s). The measurements config JSON approach described above might be more flexible and less complicated.
huddlej commented 4 weeks ago

After writing up this issue, I still think it would be a nicer user interface for the measurements config JSON to provide a way to order groupings, but I'm also content (or resigned) to manually implement this kind of ordering with custom scripts like we do in seasonal-flu. Compared to other UI improvements for the measurements panel, this ranks lower for me now.