Open fr1ll opened 4 years ago
Thanks very much for reaching out @fr1ll -- this is a great idea! We've been thinking though strategies that can be used to handle huge datasets, but this kind of clustering technique hasn't come up yet!
One related idea that we've been kicking around is a kind of hierarchical tree like the sort generated by scipy.cluster.hierarchy.dendrogram
. That kind of hierarchical model could be really useful in pursuing this kind of marker cluster layout...
Another related idea that's come up is a hierarchical treemap layout. Here again the dendrogram tree could be partitioned so that we could create huge top level boxes/images for the top-level genres in the collection. One could then interact with those top-level boxes to explore the subgenres they contain, and so on and so forth, until one got to the individual images from the selected genre/subgenres in the dataset. In the case of truly massive input datasets, it might be possible to create photo mosaics for each of the top-level images using input images that fall under the given genre branch...
If you want to try out some kind of hierarchical feature, we'd certainly be very grateful! If you want to get your feet wet with some WebGL first, I would recommend this introduction to three.js followed by some research into this overview of "textured point sprites" [textured point sprites are the primitives used in this codebase]. If you dig into those and get interested, feel free to follow up and we can try and put something along these lines together!
I like the idea of a tree to define the hierarchy. This seems similar to the edge bundling UMAP plots. I am not sure how the Leaflet plugin does its clustering.
The boxes idea is interesting as well, especially for maximizing the use of screen space at each zoom level. The tradeoff is that some structure gets lost.
With either the grid or marker-cluster approach, there's some nice advantages to this kind of hierarchical view:
I'm most interested in using UMAP as a way to explore new image datasets -- basically finding how they automatically cluster and especially finding outliers. I think this could be very powerful environment to do data labeling. Since I love seeing this global structure, I put less weight on the pictures using up all the screen space with a grid.
Either way, it's hard to know how the interactions will feel without prototyping them. Thanks for sharing those resources, I will start taking a look!
I'm a big fan of pix-plot, especially with addition of a lasso which means it can be used for annotating data!
For very large datasets, I think a visualization similar to "marker clustering" as done in many web maps would be amazing. The inspiration comes from the GeographPhotos leaflet plugin -- demo here.
I wonder how difficult this would be to implement in WebGL. I have no experience in WebGL, but I am happy to spend some time trying something along these lines. I have already started looking into adapting the Leaflet plugin above + adding a lasso, but maybe pix-plot is a better starting point.
Sorry if "Issues" isn't the best place to share this idea--I wasn't sure!