theosanderson / taxonium

A tool for exploring very large trees in the browser
http://taxonium.org
GNU General Public License v3.0
99 stars 17 forks source link

Add map view #534

Closed simonbukin closed 12 months ago

simonbukin commented 1 year ago

Taxonium Map View

This PR adds a map view to Taxonium, which allows for tracking origins of samples in the tree.

Screenshots

TBA

Video (will need to be updated with a permanent link after expiry in a few days)

This is a pretty large PR, and I think some of the merge difficulty may come from the swap of naming from taxonium_web_client to taxonium_component (it took a little while to get this working locally).

The other more major change in the layout of the code was in useView.js, in which I switched to checking the viewId and using setViewState to compartmentalize the viewState updates. This was necessary because keeping both the map state (like zoom, latitude, longitude, etc) and tree state in the same viewState led to tons of problems and unexpected crashes when a viewState update for one view overwrote another. I'm a little weary that this change didn't introduce bugs in other parts of Taxonium, since I was mostly focused on the map view, so this is definitely something to check on.

Can definitely add more to this PR and fix any problems you may find. Thanks!

vercel[bot] commented 1 year ago

Someone is attempting to deploy a commit to a Personal Account owned by @theosanderson on Vercel.

@theosanderson first needs to authorize it.

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cov2tree ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 29, 2023 8:50am
theosanderson commented 1 year ago

(thanks so much for this, still haven't had a chance to dive in but just wanted to communicate: the failing tests are my fault not yours!)

theosanderson commented 1 year ago

Thanks a lot for proposing this. I think the latest merge has maybe created a general breakage? But I found an old commit that was working.

Adding geographical features is definitely on the roadmap for Taxonium, so this is something I'm interested in, but that also means I want to be careful to implement anything in a way that is compatible with future developement.

Is there a specific use case you have so far had in mind during development? It feels a bit like a visual legend/key for the tree?

simonbukin commented 1 year ago

Exactly right, the general idea here was to create a mapping between the tree and the geography of where samples originate. This was originally proposed by Russ, and I believe he mentioned that he knew of some potential users who wanted to upload their own data to be able to better visualize the distribution of the samples on a map, not just their relation in the tree.

There could definitely be more work done on colors / more clearly communicating the relationship between the samples and their position on the map. I originally played with the idea of drawing lines between the tree and the map, but that seemed pretty infeasible with some of particularities of deckgl and drawing views on top of each other (the code got incredibly messy without much tangible benefit). I also explored adding a 3D map to better communicate density of samples, but that also proved to be a headache.

I think a key problem to solve here is communicating the density of samples better, because as you said, right now it's more of just a visual mapping.

What do you think would be the best way to make this sort of change as compatible as possible with future development? I'd be happy to write some tests or clean up the changes in if that would be of any help.

(I'm also not super sure about the merges, I had to do a bit of fiddling locally to pull in the upstream changes since my local copy was stale, but the merged copy worked fine afterwards)

theosanderson commented 1 year ago

Firstly, I'm really grateful for your work on this.

Re. merge: If I clone your repo, checkout the branch, cd to taxonium-component and run yarn storybook I get nodes is not defined errors, which are also what happen on the vercel preview here.

As discussed, for me the current feature is like a legend. For me that doesn't provide a lot of utility for users. In general we don't have a pattern that as you zoom in on the tree that anything else changes, with the exception of the Key (but the Key is integrated into the tree panel so that makes sense). In my vision for Taxonium, zooming the tree should not affect the map (or anything else). Instead, selecting a node would probably cause the map to update to display information about the descendant nodes. Searches also might be represented on the map as well as the tree.

I think some quantitative represenation on the map is going to be crucial. It's worth having a look at how Nextstrain (or Microreact) goes about this. I think I probably imagine something relatively similar for Taxonium. But in our case I think the slider to select a date window would be built into the map panel. But there's a lot of work to get there, probably with extra backend routes.

In terms of future-proofing, this is both about making sure that things are moving in the direction of the vision above, and also about configurability. At the moment there are more-or-less no hardcoded metadata fields, so country doesn't need to be called country, and samples don't need to have a country, they could have a city or something instead. In the public tree (as you've seen), country has values "England", "Scotland" and "Wales". So we would want to use a geojson for that with those countries separate. For other trees a geojson with "UK" may be appropriate. For analyses of a particular country, a map of that country with points distributed by city may be appropriate. I think I'd imagine something like a field in the config JSON which could be configured like:

geoData : [
{name: 'Sampling country',
geoJson: 'http://example.com/mygeojson.json' (or raw geojson),
metadata_field: 'country'
},
{name: 'Sampling city',
geoJson: 'http://example.com/mygeojson2.json' (or raw geojson),
metadata_field: 'city'
},
{name: 'Departure country', // (for travel cases)
geoJson: 'http://example.com/mygeojson.json' (or raw geojson),
metadata_field: 'departure_country'
},
]
simonbukin commented 1 year ago

Currently taking a look at the errors from Vercel, it looks like the merge may have swallowed some changes so I'm fixing that. The nodes issue is solved locally but that's spawned a few more problems to look into :)

I get what you mean about the map updating as the tree is manipulated, which is a new pattern. I think the selection approach you mentioned should be possible, since currently the map is updated using the nodes as the set of data (which as far as I could tell during development updates to be all nodes in the main viewport), and it would just be a matter of hooking up the selected node and it's descendants to the map. The metadata json would also be possible, and a lot more flexible than the current approach.

I've looked at the Microreact implementation, and it looks like they rely on a histogram view alongside the map view, with a pie chat centered on each region showing proportions of different samples there. That should be possible, but would be a solid chunk of work to implement. I'd be happy to take a look at it, but I am wondering what the development process should be here. Since the current work is in progress, would it be best to keep it in a draft PR that I update as I go, or should changes be merged into a feature branch in the main Taxonium repo and then merged in to master when finished?

theosanderson commented 1 year ago

Thanks for taking these comments in good spirit. I totally agree that there is a chunk of work here. I think that some sort of quantitative representation will probably be in the first version of this to launch. I really don't have preferences about how we go about the process of moving closer to that. I'd definitely be happy to accept merges into a feature branch at any time.

Unfortunately I think it will need work at the backend - the way things are set up the client is only available of a downsampled version of the dataset depending on how you zoom. The closest current analogue for this is the "getTipAtts" feature. We would need to create a new backend route from the local and server backends that gets the total counts for an attribute for each country. I can help with that aspect (at least).

simonbukin commented 1 year ago

Sounds good! I can start work in a feature branch then. I think that I can get a histogram view alongside the map, which should allow for viewing relative proportions of samples from different countries.

For now I think I can keep using the nodes to test the histogram (once that's done), and we can hook up the new source of attribute data to the map / histogram combo at a later date when the backend work is finished (no rush here)

theosanderson commented 1 year ago

I think at the moment I'm imagining something closer to the Nextstrain approach with scaled pies than a histogram. This was ChatGPT's idea for that in DeckGL: https://chat.openai.com/share/664b626d-8255-429b-8e0a-6390087affd0 (No pressure on any of this of course, I can also work from the feature branch in due course).

simonbukin commented 1 year ago

Neat! I think that approach could work. Let me see if I can get something working in a feature branch when I have a second

simonbukin commented 12 months ago

I'm looking into merging into a feature branch (add-map-view), but it looks like you may need to create a new branch on the main repo before I can contribute to it. Would you be able to do that? Thanks!

theosanderson commented 12 months ago

Thanks - I've created an add-map-view branch!