nextstrain / nextstrain.org

The Nextstrain website
https://nextstrain.org
GNU Affero General Public License v3.0
88 stars 49 forks source link

List Resources (prototype UI) #803

Closed jameshadfield closed 6 months ago

jameshadfield commented 7 months ago

This (draft) PR represents a bunch of ideas about how we can present datasets. At the moment it is working to display all of the core datasets - check it out via the /pathogens URL on the review app. I wanted to convey:

Eventually I hope that this UI will expand in usage to other resources (narratives, intermediate files) as well as other resource sources (staging, groups, perhaps on the main splash page in some form?)

Please try it out and send comments -- what worked well, what didn't, and any ideas you may have for improvements to the UI. Thanks!

Code quality, styling tweaks, little bugs

At the moment it's a series of ideas. I've iterated on the ideas quite a lot, so I think the ideas are good, but I haven't spent time cleaning up the styling or code for each idea as I wanted to discuss the ideas before moving on to that stage.

jameshadfield commented 7 months ago

Thanks @huddlej - great comments. I'll wait for others' before doing any more work here, but I think yours are generally both sensible and actionable.

I wish that each card’s header specified the human-readable name of the dataset like “SARS-CoV-2 datasets last updated 2024-02-28” instead of “Datasets last updated 2024-02-28”

Some of the cards include datasets for multiple distinct pathogens or appear incomplete. For instance, there is a card with a single mpox lineage B.1 dataset followed by another card with two more mpox datasets and the seasonal flu B/Vic dataset (below). Similarly, Zika is grouped with subsets of Vic, H3, and H1 datasets.

There are actually two different grouping approaches here. The default, which is used in your screenshot, groups datasets of any pathogen by the last-updated date. So there can be a mix of some mpox, flu etc within the same group. If you toggle to "alphabetical" you'll get the grouping by pathogen. So either the toggle wasn't obvious, or what it did wasn't obvious, and/or the default was wrong. I've pushed up a change to improve the phrasing which hopefully makes things a little clearer:

image image
huddlej commented 7 months ago

Ah, thanks, @jameshadfield! I did miss that grouping option. I expected the default to be grouping by dataset (since I'm a user who cares most about flu and would like to see all those data together) and then ordering of those datasets by most recently updated and an option to order alphabetically. To get this display for flu, it seems I should select the showcase tile for "seasonal flu" or type "flu" and "seasonal" into the filter display.

huddlej commented 6 months ago

After the live demo today, my only maybe blocking request would be for the default grouping to be by "pathogen (sorted alphabetically)" instead of the most recently updated, but otherwise this seems immediately useful even if we refine the aesthetics more later...

trvrb commented 6 months ago

This is fantastic work @jameshadfield! We just talked through most of my review, but I'll briefly summarize main suggestions here:

  1. Keep one pathogen per card and sort cards alphabetically by pathogen name or by last updated date
  2. Quicker main link. Ie clicking on card title of "rsv" or "ncov" should be the equivalent of going to /rsv or /ncov.
  3. The modal box with individual snapshots doesn’t flex well between smaller collections and larger collections. There should be a simple way to dial in a desired particular date rather than needing to identify a corresponding snapshot. I imagine this is actually the main way that someone would want to interact here. I could imagine a slider through time along the x-axis in the current beeswarm plot and feedback as snapshot changes. Ie UI to select a date, rather than UI to select a particular snapshot.
  4. Sorting datasets within a pathogen card based on manifest_core.json. This would be the natural way to order ncov_gisaid_global, ncov_gisaid_africa, etc... or seasonal-flu/h3n2, seasonal-flu/h1n1pdm, etc... If there are datasets that don't appear in the manifest include after manifest datasets and sort alphabetically. This strategy would keep the extraneous ncov datasets we don't actually care about at the end of the list.

Small critiques:

  1. I'd suggest replacing "Last updated between 2020-02-21 & 2024-03-05" with "Last updated 2024-03-05", just to be less verbose.
  2. Modify the icons for dataset count and snapshot count. For me these are semantically "share" and "refresh". I might suggest something like list-tree for dataset count and something like rectangle-history or clock-rotate-left for snapshot count. Edit: I also like @joverlee521's suggestion of bullet-list for dataset count.
  3. Collapse carat is hard to restore due to it jumping down the page. Suggest moving the carat to the top of the card.
  4. It would be nice if the top of each of the three columns had the full path spelled out.
  5. Semantically, I’d expect the sparkline to live next to the updated counter.

One thing that we didn't talk about: I do think it's pretty essentially that the page is responsive enough to shrink down to phone size width. Currently it looks like:

Screenshot 2024-03-07 at 3 27 21 PM

Given the complexity of the UI it's going to be difficult to keep all the functionality on a phone display. My suggestion here would be once the display width is below a particular value that you hide the sparkline and hide the snapshot counter, so effectively just creating a long list of dataset names to tap on. Ie don't attempt to surface the snapshot select modal and instead just have it get to the current dataset.

joverlee521 commented 6 months ago

We chatted about this as a group in today's project meeting, but I wanted to jot down my feedback here. I really like the pathogen grouped display, I think it is much more usable than the current page! The dataset modals that exposes the past snapshots are very helpful! No more trying random dates in the URL

Some aesthetic nitpicks:

Questions:

trvrb commented 6 months ago

Do you plan to add links to the individual pathogen pages? (e.g. https://nextstrain.org/sars-cov-2/)

I see these "resource collection" pages for /sars-cov-2 and /influenza as separate from the dataset UI. I'd think maybe in this case links to these two pages should live above the dataset UI. However, I see it's going to be quite unclear that these tiles go directly to a different page on nextstrain.org vs the lower tiles that are part of the dataset UI act as filters.

I could see a different strategy where we keep a selection of hand-curated "showcase" tiles above the dataset UI that are direct links. This would work similarly to the existing https://nextstrain.org/pathogens page and wouldn't need to be exhaustive. The tiles could also be shrunk further.

Screenshot 2024-03-08 at 10 29 38 AM

My thinking here is also influenced by wanting to revamp the "showcase" of dataset quick links on the splash page. We can reuse denser tile design in both places.

Then in the dataset UI make just text suggestions for the filter box.

filter-examples

We're not going to have easy tiles for dataset UI for datasets under /groups and text could be more flexible in this case.

trvrb commented 6 months ago

One more small suggestion: Over the past couple days, I was periodically decrufting the nextstrain-data bucket of the extraneous ncov_ datasets. However, I generally couldn't really tell when the indexer ran and whether it made sense to walk through again. I could see it as useful to have a sentence at the very bottom of the UI that reads "Datasets indexed X hours ago." And when you hover you get a UTC timestamp, essentially how commit times work on GitHub with relative being the text on the page and absolute being on hover. Not essential (and could be left off PR), but it would also seem like a particularly useful thing when we extend this UI to /groups where people might be confused about a recent upload not showing up.

jameshadfield commented 6 months ago

This PR is ready for re-review -- I'd like to merge shortly as it only affects the /pathogens and /staging pages and is a big improvement to them (in my opinion). Some points to keep in mind:

Things I'd like to push to subsequent PRs:

responses to (most) review comments above

[@huddlej] blocking request would be for the default grouping to be by "pathogen (sorted alphabetically)" instead of the most recently updated

[@joverlee521] Thoughts on separating grouping and sorting options? I'd like to group by pathogens but sort by published date, where the top entry is the pathogen with the more recent dataset.

[@trvrb] Keep one pathogen per card and sort cards alphabetically by pathogen name or by last updated date

[@huddlej] I wish that each card’s header specified the human-readable name of the dataset like “SARS-CoV-2 datasets last updated 2024-02-28” instead of “Datasets last updated 2024-02-28”.... Some of the cards include datasets for multiple distinct pathogens or appear incomplete. For instance, there is a card with a single mpox lineage B.1 dataset followed by another card with two more mpox datasets and the seasonal flu B/Vic dataset (below). Similarly, Zika is grouped with subsets of Vic, H3, and H1 datasets.

The grouping is now always by-pathogen, with two sorting approaches (hover over the sorting options to see a full explanation of how they work). The default sorting is alphabetical. I plan to revisit the within-pathogen sorting when I implement the sorting-according-to-manifest-order (see above).

[@huddlej] I wish the icons for “total number of datasets” and “total number of versions” were a little more clearly mapped to those concepts.

[@joverlee521] Echoing the icons are misleading for datasets/snapshots. I'd suggest a bullet list for datasets and albums for snapshots.

[@trvrb] Modify the icons for dataset count and snapshot count.

Icons updated 😄

[@joverlee521] The word "updated" can be misleading because here it means when the dataset was uploaded to S3, maybe "published" is more appropriate? Or just rely on the word "snapshots", i.e. "Snapshots between 2023-12-13 & 2024-02-06"

[@trvrb] I'd suggest replacing "Last updated between 2020-02-21 & 2024-03-05" with "Last updated 2024-03-05", just to be less verbose.

Updated to just use "Most recent snapshot: ...". Even this is a little misleading because if it's updated in the last few days the index won't know about it. I've tried to add this caveat where I can, but let me know if you can think of a better way to present this.

[@joverlee521] Do you plan to add links to the individual pathogen pages? (e.g. https://nextstrain.org/sars-cov-2/)

I've added this as a quicklink for the SC2 group after discussion with Trevor. While we have a page for /influenza, it contains no information beyond what I'm presenting here so I've not made a link to it.

[@trvrb] Collapse carat is hard to restore due to it jumping down the page. Suggest moving the carat to the top of the card.

Switched back to this (it was actually my initial implementation)

[@joverlee521] The per dataset sparklines take up a lot of room on the page. Maybe they can be summarized as a single sparkline per pathogen and only display the per dataset sparklines on hover?

[@huddlej] I wish that when I hovered over the sparkline for each dataset that I saw the same details on demand that I get when I hover over the dataset link.

I've ended up dropping the KDEs / sparklines and gone with a text-based summary. Let me know what you think!

[@joverlee521] I was initially confused by the empty pipes, I thought there was some UI issue that prevented the words from loading. I wonder if a tree view of the datasets would help here?

I've improved these a bit - hopefully they're not still confusing? I wanted to avoid UIs which required lots of clicking...

[@trvrb] Quicker main link. Ie clicking on card title of "rsv" or "ncov" should be the equivalent of going to /rsv or /ncov.

Done

[@trvrb] The modal box with individual snapshots doesn’t flex well between smaller collections and larger collections....

[@huddlej] For the modal window of past snapshots, I wish that the dots were bigger (easier to hover over and click, although I see now that the size varies by dataset and maybe by number of points in each column) and that there was a y-axis that visually indicated the order of the dots in each column. It also looks like dots aren’t grouped into columns by month since the same column can have snapshots from October, November, and December. What if they were actually grouped by month?

Modal redone and much improved - UI described in the companion docs update

[@huddlej] Also, what if the modal window had a standard “x” icon in a top corner ... Also also, you probably don’t have to literally type “fine print” 😆

Done and done!

trvrb commented 6 months ago

Awesome! I really appreciate attending to all the various issues. Just a few quick notes:

  1. I'm not getting any difference in behavior between the "alphabetical" sort and the "most recently updated" sort. Both seem to be most recently updated with zika currently at the top of the list:
Screenshot 2024-03-28 at 8 57 05 PM
  1. Remove the "(monkeypox)" parenthetical. I see how you meant this to be clarifying, but it reads as "monkeypox" is the real name for the pathogen that we short hand of "mpox" for, kind of like how "SARS-CoV-2" is the real name of the pathogen that we short hand as "ncov". However, "monkeypox" is fully deprecated as a name.
Screenshot 2024-03-28 at 8 58 34 PM
  1. For "flu (Influenza)". Replace with "flu (seasonal and avian influenza)". It's not capitalized.

  2. Move the "SARS-CoV-2" tile to the far left. It should stay as the most highlighted dataset.

jameshadfield commented 6 months ago

Thanks for the review @joverlee521 and comments @trvrb! All fixed, and I plan to merge in tomorrow if there are no further comments.

trvrb commented 6 months ago

Awesome! This is really great work @jameshadfield. Will be immediately useful at /pathogens and will be a good basis for other pages as well. The updated modal with the bubbles and the timeline is especially clever. It blends playful and useful nicely. Please merge whenever you'd like.

jameshadfield commented 6 months ago

Force pushed to fix up the final review comment. CI failed due to a spurious / unrelated / stochastic failure:

getAvailable community URLs correctly interpret the GitHub branch › Explicit non-default branch

I manually retriggered CI which passed, so I'm merging this now.

(As an aside, I'm noticing more and more stochastic CI failures. Or maybe I've just been doing more work on nextstrain.org lately and they've always been here.)