observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics
https://observablehq.com/plot/
ISC License
4.46k stars 182 forks source link

1-d KDE transform? #1469

Open Fil opened 1 year ago

Fil commented 1 year ago

(As mentioned in a few places: #948, #943, #791.)

We can look at https://github.com/uwdata/fast-kde/ for speed, and https://observablehq.com/@d3/kernel-density-estimation for a more straightforward implementation. Could be used to create violin plots, etc. Building on our 2-d implementation, the efficient approach might be to just bin (with linear binning for when a point doesn't fall exactly on a point) then blur.

An (old) experimental notebook: https://observablehq.com/@observablehq/fast-kde-and-plot/2

(2-d KDE is addressed by the density mark.)

Examples would include violin plots

(Need to add some padding.)

Hvass-Labs commented 12 months ago

I would really like KDE as well. It would be great if it also worked with brushing / selection of data. Then I could show a histogram as solid bars and overlay it with the KDE as a thin curve, and the user could brush / select a part of the plot, and I could then get the selected histogram and KDE values so I can use them elsewhere. If that is possible? Thanks!

martindaniel4 commented 11 months ago

Another upvote for KDE 👋

alex-rand commented 11 months ago

Yes please!

huw commented 1 month ago

I published a PoC at @huw/density1d. I wouldn't PR it until I harden the implementation a bit (not sure if I'll use it much just yet)—if anyone else wants to take this and run with it you wouldn't be getting in my way :)

Fil commented 1 day ago

Here's a sketch of the transform I have in mind: https://observablehq.com/@observablehq/plot-kde

The code is quite minimal, it runs a gaussian kernel — approximated by d3.blur, with a radius adapted to match the desired bandwidth — on a bin transform with something like 100 thresholds. It inherits the settings of the bin transform (like y reducers, z partitioning of series, etc).