holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.32k stars 365 forks source link

Support compound reductions for antialiased lines on the CPU #1146

Closed ianthomas23 closed 1 year ago

ianthomas23 commented 1 year ago

This implements compound reductions such as mean and summary for antialiased lines on the CPU only. The only reductions not yet covered are std and var, but there is a possible solution for these being investigated. There is no support for antialiased lines on the GPU, and this will not occur soon as it requires significant changes.

Demonstration of antialiased lines and various reductions:

import datashader as ds
import datashader.transfer_functions as tf
import numpy as np
import pandas as pd

df = pd.DataFrame(dict(ystart=[0.2, 0, 1, 0.8], yend=[0.2, 1, 0, 0.8], value=[1, 2, 3, 4]))

cvs = ds.Canvas(plot_width=200, plot_height=150, x_range=(-0.1, 1.1), y_range=(-0.1, 1.1))
kwargs = dict(source=df, x=np.asarray([0, 1]), y=["ystart", "yend"], axis=1, line_width=15)

for i, reduction in enumerate([
    ds.any(), ds.count(), ds.sum("value"), ds.min("value"), ds.max("value"),
    ds.first("value"), ds.last("value"), ds.mean("value")
]):
    agg = cvs.line(agg=reduction, **kwargs)
    im = tf.shade(agg, how="linear")
    ds.utils.export_image(im, f"temp{i}")

Output images for any, count and sum temp0 temp1 temp2

min and max temp3 temp4

first, last, mean temp5 temp6 temp7

Two observations:

Closes #1133.

codecov[bot] commented 1 year ago

Codecov Report

Merging #1146 (a284042) into master (e3d2d1f) will increase coverage by 0.17%. The diff coverage is 95.74%.

@@            Coverage Diff             @@
##           master    #1146      +/-   ##
==========================================
+ Coverage   85.20%   85.37%   +0.17%     
==========================================
  Files          34       34              
  Lines        7732     7782      +50     
==========================================
+ Hits         6588     6644      +56     
+ Misses       1144     1138       -6     
Impacted Files Coverage Δ
datashader/compiler.py 95.08% <83.33%> (+0.04%) :arrow_up:
datashader/reductions.py 87.23% <91.66%> (+1.39%) :arrow_up:
datashader/core.py 87.76% <100.00%> (ø)
datashader/enums.py 100.00% <100.00%> (ø)
datashader/glyphs/line.py 92.80% <100.00%> (+0.09%) :arrow_up:
datashader/utils.py 76.62% <100.00%> (+1.11%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

ianthomas23 commented 1 year ago

Again, I'm not sure if there is a way around this? Seems like value should be e.g. half the actual value, for the smoothed bits of the edge?

There is no workaround for this, it would need a more complicated implementation as we don't store the information required to do this. We would need to separate out the antialias weighting (what would conventionally be called the alpha but trying not to use that term here) from the normal aggregate numbers. For the count of a single line segment the count agg would be 1 for every pixel in the line right up to the edges, and the weighting would be 1 along the middle of the line, down to just about 0 at the extreme edges. The 2D agg returned to the user would be the product of the two, giving the same number as we currently return (for a single line).

For a compound reduction of a count and sum we can imaging 3 aggs, count, sum and weighting. For the count we'd return count*weighting to the user, for the sum we'd return sum*weighting, and for the mean we'd return (sum/count)*weighing, giving you exactly what you want.

In general, we couldn't just have a single weighting per canvas.line call, we'd need a weighing per agg. So maybe we'd think of changing each of the current agg's shapes from (height, width) to (height, width, 2), so the agg is attached to its weighting.

We would need to decide on appropriate mathematics to combine say (value1, weighting1) with (value2, weighting2) for a particular pixel. Let's assume we want a linear combination, so the combined value must be value1*weighting1 + value2*weighting2. Or is it? It might be more sensible to say we want the combined value and weighting so that value*weighting = value1*weighting1 + value2*weighting2 because the value needs to be stored unweighted (the whole purpose of this approach is to keep the weighting separate from the value). This maths is not dissimilar to combined rendering of RGB and A separately, but there we have the concept of rendering on top of existing colors in a non-commutative way. So it is more complicated than that!

In summary, to obtain a mean that more in line with what is expected would require:

  1. Twice as much memory for the 3D aggs.
  2. Slightly more complicated code, but only at the very highest and lowest levels as we'd still be scanning the source dataframe just once, only with 3D aggs.
  3. Thought about appropriate maths to combine values and weights.
jbednar commented 1 year ago

You say there is no workaround, and then proceed to lay out a workaround. :-) Please open a separate issue proposing that work, emphasizing what limitations antialiasing will have until it is done, but then I don't think there's a particular reason to undertake it at this time. Do mention the limitation in the docs, though, and I guess link to the issue. Thanks!

ianthomas23 commented 1 year ago

On rereading my previous comment it sounds more positive than I intended it to be! It is not a workaround because it doesn't work; I don't believe there is any maths that can combine values and weights in a way that works for anything other than the simplest situations. But yes, it is worth writing it up as an issue for further consideration.