holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.7k stars 404 forks source link

Update the inspect operation to support an index layer to skip slow spatial filtering #6018

Open droumis opened 11 months ago

droumis commented 11 months ago

Issue authored by @jlstevens

The current HoloViews inspect operation does a slow spatial filtering. With the new support of where aggregators ('selectors') together with the summary aggregator, HoloViews can now use datashader to collect row indices in a single pass along with whatever aggregation gets displayed (i.e. typically count).

If the inspect operation is updated to take such images as input, the slow spatial filtering over the dataframe can be skipped and then more performant inspection can then be offered.

@Hoxbro, please confirm if there is anything left to do here.

hoxbro commented 11 months ago

As mentioned in #6019, it is now possible to use selectors to get the indexes.

The existing inspection operation is still there and does not use it and likely needs to be rewritten to support the new features. I have made a bullet list in https://github.com/holoviz/holoviews/issues/6020#issuecomment-1853585142 about what I see is needed to add this functionality.

philippjfr commented 1 month ago

rewritten to support the new features

I would say we need the inspect operations to detect the presence of the index layer and then take this optimized path, while still supporting the older, less efficient approach.

jbednar commented 1 month ago

I'm having trouble tracking down all the relevant issues where this was discussed, but just so that it's recorded somewhere, I'd like to note that my original idea for an index layer was focused on cases of multicolumnar data where passing in all of the data needed to support the inspect operation could be slow and consume a lot of memory. But it's recently become clear that the indexing approach can also be important for datashading large numbers of categories, because each of those categories becomes another plot-sized array full of numbers, quickly totalling to a very large data structure to pass from Datashader to Bokeh for rendering with HoloViews rasterize. So if we instead can support an index array when used with Datashader rendering categorical data all the way down into RGBA with HoloViews datashade, then the total amount of data sent to the browser can be constant regardless of how many categories there are, while still allowing some inspection of a subset of the points. We do have to give up on being able to report the counts per category per pixel in the inspection, though, since that information isn't recoverable from the index array. Anyway, I'm sure this info belongs somewhere else, but at least it's now written down...