holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.3k stars 366 forks source link

Point antialiasing #1010

Open nmgeek opened 3 years ago

nmgeek commented 3 years ago

My input data consists of billions of microvectors. Each is so short that it typically starts and ends inside one pixel. The microvectors connect to form a non-linear path (curve?) that extends for many pixels. While these microvectors could be rendered as multiple lines, sent in a single line rendering request, I feel like I will get the best performance by rendering them as points.

The resulting image has aliasing which I feel would best be reduced by giving each point some width. The intensity of the Z value should probably decrease in proportion to the pixel's distance from the point. This is effectively line antialiasing extended (or reduced?) to points.

When you add width to the process of rendering billions of points, parallel processing (multiprocessing, GPU, etc.) is required to get reasonable performance.

@jbednar said: Line aliasing is a problem even for billions of points if they are all on the same curve. So I'd want to address open issues with line antialiasing

This is effectively the problem I have with one gotcha: the billions of microvectors are further divided into millions of curves, ie after, say, a thousand connected micro vectors, there is a "gap" which starts the next "curve". So, for best performance, representing them as points will avoid the cost of start-of-line and end-of-line antialias treatments for each curve. But really I don't know which approach would be faster: billions of antialiased points or millions of antialiased "curves." (I did look at a datashader test case for sending multiple "curves" in a single line request but did not totally grasp the required input structures and never found the associated documentation. And I think the antialiaser divides by zero anyway.)

@jbednar said: addressing aliasing for rendering points with a specified size might also address aliasing in lines with a specified width; the problems may in fact be one and the same.

Yes, there are nuances that make the problems the same and different at the same time. I am happy to explore these nuances in the limited time I have available (which seems to amount to 1-2 hours a week).

I started playing with an override of Canvas.points. It needs the scaling from the canvas to calculate antialias interpolation. It reaches into the canvas object to get the scaling parameters. (I guess the line antialiasing must do the same: I have not looked.)

jbednar commented 3 years ago

(Follows on from discussion in https://github.com/holoviz/datashader/issues/989 and https://github.com/holoviz/datashader/issues/623).

My input data consists of billions of microvectors. Each is so short that it typically starts and ends inside one pixel. The microvectors connect to form a non-linear path (curve?) that extends for many pixels. While these microvectors could be rendered as multiple lines, sent in a single line rendering request, I feel like I will get the best performance by rendering them as points.

I'm not sure I'm following the description in terms of microvectors, but this sounds like a typical polylines case: a sequence of connected points (a polyline), potentially with points spaced closely together so that multiple adjacent "segments" actually fall into a single pixel, possibly followed by additional polylines not connected to the previous one. E.g. "all the waterways of North America" would be one such dataset, with each tributary forming a polyline and the total set containing various disjointed bits.

Datashader supports a variety of data structures for representing and rendering such cases efficiently, either with a gap marked explicitly by a NaN on x or y (useful when each polyline may be a different length), or as an array of polylines, each sharing the same x coordinate but different y coordinates (useful for repeated measurements of the same quantities). See https://github.com/holoviz/datashader/pull/694 for details.

The resulting image has aliasing which I feel would best be reduced by giving each point some width.

I'm confused by what you mean by "aliasing" here. Informally, aliasing is a jagged, stairstepped appearance along the edge of a filled region that adds a spurious sharp "corner" to what is mathematically a smooth boundary. Aliasing of this type is not applicable to Datashader's current points rendering, which treats each point as an infinitesimal location that either falls in a pixel or not. Datashader's current points rendering thus has no edge, no spatial extent, and no potential for aliasing; each point simply increases the count in a pixel or does not. Under those assumptions, each pixel already represents the point as accurately as it can given the finite grid spacing.

I am interested in supporting antialiasing for points, but for me that first requires supporting a spatial extent for points ("giving each point some width", and thus making it more visible), which will lead to aliasing. Only then we can support antialiasing of that filled shape so that it can look smooth and can have an arbitrarily precise location in the plane (subpixel resolution). But this two-step approach must not be what you are proposing if you are saying that the current points rendering has aliasing already?

I'm guessing that you mean the current points rendering looks like it has aliasing if you use it to render polylines by exploiting the fact that you have multiple samples per pixel. If so, then I don't think your needs would be addressed most efficently by adding antialiasing for points, because once you give the point a spatial extent and then have Datashader antialias the result, Datashader has had to do as much or more work as it would have had to do to render antialiased polylines, and you still wouldn't have natural rendering for the line segments once you zoom in (as they would now have spatial gaps).

When you add width to the process of rendering billions of points, parallel processing (multiprocessing, GPU, etc.) is required to get reasonable performance.

It certainly helps! But I'd avoid ever saying "required" unless you also define "reasonable". Datashader's performance even on a single core surprises a lot of people.

@jbednar said: Line aliasing is a problem even for billions of points if they are all on the same curve. So I'd want to address open issues with line antialiasing

This is effectively the problem I have with one gotcha: the billions of microvectors are further divided into millions of curves, ie after, say, a thousand connected micro vectors, there is a "gap" which starts the next "curve". So, for best performance, representing them as points will avoid the cost of start-of-line and end-of-line antialias treatments for each curve. But really I don't know which approach would be faster: billions of antialiased points or millions of antialiased "curves." (I did look at a datashader test case for sending multiple "curves" in a single line request but did not totally grasp the required input structures and never found the associated documentation. And I think the antialiaser divides by zero anyway.)

See https://github.com/holoviz/datashader/pull/694 for examples of the various supported cases, which could easily include things not otherwise documented.

I started playing with an override of Canvas.points. It needs the scaling from the canvas to calculate antialias interpolation. It reaches into the canvas object to get the scaling parameters. (I guess the line antialiasing must do the same: I have not looked.)

I'm not sure what you mean by the scaling parameters, but the current line antialiasing is not based on having a width for the lines, and so it won't transfer to the case of points with a non-zero width.