pleonard212 / pix-plot

A WebGL viewer for UMAP or TSNE-clustered images
MIT License
597 stars 139 forks source link

Hotspot computation error: not enough points(2) to construct initial simplex and a suggestion #177

Closed vdet closed 3 years ago

vdet commented 3 years ago

Hello,

While running pixplot.py with --min_cluster_size 50, I encountered this error:

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-22 17:32:55.012624: I tensorflow/core/common_runtime/direct_session.cc:361] Device mapping: no known devices.
 * creating atlas files
 * reading UMAP embedding coordinates from metadata
 * creating umap pointgrid
 * creating mesh with size 1175 1175
 * filling mesh
 * creating rasterfairy layout - test
no good rectangle found for 62039 points, using incomplete square 249 * 250
 * creating grid layout
 * Clustering data with hdbscan
Traceback (most recent call last):
  File "/Users/detours/ai/ciga-resnet18/code/python/pix-plot-master/pixplot/pixplot.py", line 1380, in <module>
    parse()
  File "/Users/detours/ai/ciga-resnet18/code/python/pix-plot-master/pixplot/pixplot.py", line 1377, in parse
    process_images(**config)
  File "/Users/detours/ai/ciga-resnet18/code/python/pix-plot-master/pixplot/pixplot.py", line 134, in process_images
    get_manifest(**kwargs)
  File "/Users/detours/ai/ciga-resnet18/code/python/pix-plot-master/pixplot/pixplot.py", line 423, in get_manifest
    'default_hotspots': get_hotspots(vecs=read_json(layouts['umap']['layout'], **kwargs), **kwargs),
  File "/Users/detours/ai/ciga-resnet18/code/python/pix-plot-master/pixplot/pixplot.py", line 1204, in get_hotspots
    hull = ConvexHull(positions)
  File "qhull.pyx", line 2433, in scipy.spatial.qhull.ConvexHull.__init__
  File "qhull.pyx", line 356, in scipy.spatial.qhull._Qhull.__init__
scipy.spatial.qhull.QhullError: QH6214 qhull input error: not enough points(2) to construct initial simplex (need 3)

While executing:  | qhull i Qt
Options selected for Qhull 2019.1.r 2019/06/21:
  run-id 1553755139  incidence  Qtriangulate  _pre-merge  _zero-centrum
  _maxoutside  0

It is related to this issue, also addressed here.

For now, I just by-passed the faulty hull computation in get_hotspots. In the context of my medical application I must favor accuracy over aesthetics: if the cluster have irregular shapes, I'd rather to know about it. Highlighting individual points rather than having the convex hull visualisation would be better in a scientific context, in my opinion. The optimal solution would be to set the transparency (or saturation) of the cell borders according the cluster memberships returned by HDBSCAN.

Not also the warning no good rectangle found for 62039 points, using incomplete square 249 * 250, which I could not get rid of by tweaking the arguments of the call to rasterfairy.transformPointCloud2D in get_rasterfairy_layout(**kwargs).

All the best,

Vincent

duhaime commented 3 years ago

Thanks for this note Vincent (@vdet)! We are actively rethinking the way we handle hotspots, and this may be a good opportunity to open up the conversation a little.

In the past, when a user hovered a cluster, we drew a gray polygon around the convex hull of the points contained in the cluster. The trouble with this approach is that we only compute those hotspots in one particular layout (the UMAP layout), but the user can change the point layout such that the images are displayed e.g. by filename. If a user selects the filename layout then wants to see the images in a cluster, the convex hull polygon makes no sense, as the points may be super distributed through the space rather than colocated as they were in the UMAP layout.

Given this insight, we thought we'd move to a visualization approach that adorns / highlights the individual images that belong to a hotspot when a user selects the hotspot. If you pull the last commit in this repo, you'll see that new behavior. (You can restore the gray convex hull polygon by uncommenting this line.

Does the new behavior relax the convexity impression sufficiently?

That warning from rasterfairy is just indicating that your number of images (62039) doesn't have an integer as its square root. (passing in 1024 images should not trigger that warning as one can transform the list of 1024 images into a perfect grid of 32 x 32 images, so one gets a "complete" square, whereas 1025 images would require a 32 x 33 grid in which one row or column only had a single cell, and thus would incur the warning).

I hope this helps! We're certainly happy to follow up either way!

vdet commented 3 years ago

we thought we'd move to a visualization approach that adorns / highlights the individual images that belong to a hotspot when a user selects the hotspot.

Thanks Douglas, highlighing individual points is much better. The white border, however, is not visible when zooming out. A brighter color would be more prominent. Where shall I customize this?

It is quite typical that with HDBSCAN most points are among the -1 unassigned points, but still have significant cluster membership. clusters.label_ returns the core of 'prototype' points with high cluster membership, but in some datasets most points are intermediates between these few prototypes points. These fuzzy classes is the reasons I and others have been using HDBSCAN in the first place. Variation of transparency or saturation would make it possible to represent the cluster membership computed by HSDSCAN, and be faithful to class fuzziness rather that highlighting exceptional points.

Since the images are barely visible when fully zooming out, a toggle between color points and images might be warranted.

A great functionality of pix-plot is the possibility to map a selection made in one space onto other spaces. For example, my application is about tiles from gigantic microscope images. I use lat and lng to map the tiles onto the the physical 2D space of the tissue slice. It is most informative to select image tiles from the physical space and display where they map in the UMAP embedding, and vice-versa. From that perspective all spaces should be treated similarly with respect to selection. This not the case right now: I can only create hotspots in the UMAP view. Now that the hull visualization is disabled, I guess manual spot creation could be enabled in all views.

Vincent

duhaime commented 3 years ago

You should be able to change the white border color in the shader code itself. You could crank up that alpha value to 1.0 for starters, and if that still wasn't evident enough, I'd increase the borderWidth as we discussed in another thread.

a toggle between color points and images might be warranted.

This is a really interesting idea, and one I've wanted to implement for some time. Let me bring this to the team to see if we can get this rolling.

Implementing the idea above would make it much easier to show the color mixtures...

I just merged a commit that should allow you to interact with hotspots in all layouts--if you try that out we're happy to hear your thoughts!

vdet commented 3 years ago

I just merged a commit that should allow you to interact with hotspots in all layouts--if you try that out we're happy to hear your thoughts!

Thanks. It works. To make this complete I also commented the conditional in line 1732 to also enable the creation of hotspots in all layouts:

//####    if (layout.selected == 'umap') {
      data.hotspots.setCreateHotspotVisibility(true);
//####    }

Thanks!

Vincent

duhaime commented 3 years ago

Excellent, I just added the behavior you changed to master :)

If this issue is resolved, I'd be grateful if you could close it out; else I'm happy to follow up!

duhaime commented 3 years ago

Whoops, ping @vdet