lab-cosmo / chemiscope

An interactive structure/property explorer for materials and molecules
http://chemiscope.org
BSD 3-Clause "New" or "Revised" License
119 stars 29 forks source link

Density heatmap for large datasets #314

Open JPDarby opened 8 months ago

JPDarby commented 8 months ago

For very large datasets it would be nice to have the option of replacing the scatter plot with a density heatmap. I'm imagining loading a random structure from each "bin" and maybe dynamically updating the binning with the zoom level.

Happy to have a go at this myself but keen for any suggestions!

@Luthaf already suggested using a custom loadStructure callback for visualising the structures on demand.

ceriottm commented 8 months ago

This would be an excellent feature, but it is unclear (1) how much support is there for this kind of idea in plotly and (2) how much this would weigh on the memory footprint of the dataset and widget. Chemiscope is built assuming that everything can be made portable, and even the dynamical loading of structures is something we never exploited much.

Perhaps one possibility would be to still have only a few hardcoded representative structures in the dataset, but add volumetric data that can be visualized in plotly, to give a better sense of the distribution of data. In this sense, one could imagine of providing "shape" data for the property panel similar to what we recently added for structures. This way, one could visualize a convex hull, or do something a volumetric plot of the density of points.

Perhaps it'd help to advance the discussion if you explained what is the problem you are facing and want to solve.

JPDarby commented 8 months ago

Thanks for the fast reply and consideration.

I'm hoping to use chemiscope to visualise structures stored in the NOMAD database. The dream is to have an interactive plot that updates in "near real time" as a user specifies/adjusts their query. This is an example query and there are already some interactive widgets. The whole database contains ~10 million structures so I'm suspicious (but admit I haven't checked this...) that visualising large queries will be very slow atm and that some sort of heatmap + dynamical loading of structure data is the way to go.

We're planning to precompute averaged SOAP vectors (element agnostic) and MACE descriptors (learned alchemical embedding) for every structure. Then use some combination of PCA and parametric UMAP for the dimensional reduction depending on the size of the query.

JPDarby commented 8 months ago

In terms of your points (1) I don't think plotly supports the dynamic rebinning. I'm happy to have a go at this. Could also switch from heatmap to scatter plot if a certain zoom threshold is crossed. (2) having a representative set of structures available to be viewed for each bin would be completely fine and maybe this would be a good place to start

Luthaf commented 8 months ago

Since this is intended for a specific deployment of chemiscope at NOMAD, another possible solution would be to replace the map widget entirely, and only re-use the other parts of the code. You could write a new widget with whatever technology works best to display the heatmap and link it to the chemiscope structure viewer, loading structures on-demand.

ceriottm commented 8 months ago

I think one possibility that would combine a lot of advantages and be relatively easy would be to have the query generate a chemiscope .json that is then loaded dynamically, "sparsified" to show some representative structures. If you then zoom in, one could then have a button to update the view, re-generating a .json for that section.