holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.32k stars 366 forks source link

ZeroDivisionError with line antialiasing #989

Closed russss closed 2 years ago

russss commented 3 years ago

Hello,

I've been playing with the new line antialiasing feature, and it's throwing a ZeroDivisionError on my dataset, with a less than helpful stacktrace (included below).

I managed to identify the division by zero as happening here (grad = dy/dx in _xiaolinwu). I assume this means that I have repeated points in my dataset (not sure how), but the error is not very helpful.

Adding an early return to that function when dx == 0 appears to fix the error, although it is taking substantially longer to run with antialiasing enabled so I can't tell if it's actually providing a sensible result.

Versions

Python 3.8 datashader 0.12.0 numba 0.51.2 numpy 1.20.1 (+ a whole load of other stuff in my environment which I am sure is irrelevant)

Stack trace

``` --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) in 9 outline_img = tf.shade(outline_agg, cmap=['black']) 10 ---> 11 agg = canvas.line(lines, 'x', 'y', ds.max('voltage'), antialias=True) 12 lines_img = tf.dynspread(tf.shade(agg, cmap=colorcet.bgyw), threshold=0.6, max_px=width//150) 13 #agg = canvas.points(substations, 'x', 'y', ds.sum('voltage')) ~/.cache/pypoetry/virtualenvs/datashader-env-0ZOB1KZS-py3.8/lib/python3.8/site-packages/datashader/core.py in line(self, source, x, y, agg, axis, geometry, antialias) 403 format(agg)) 404 warnings.warn(message) --> 405 return bypixel(source, self, glyph, agg) 406 407 def area(self, source, x, y, agg=None, axis=0, y_stack=None): ~/.cache/pypoetry/virtualenvs/datashader-env-0ZOB1KZS-py3.8/lib/python3.8/site-packages/datashader/core.py in bypixel(source, canvas, glyph, agg) 1200 with np.warnings.catch_warnings(): 1201 np.warnings.filterwarnings('ignore', r'All-NaN (slice|axis) encountered') -> 1202 return bypixel.pipeline(source, schema, canvas, glyph, agg) 1203 1204 ~/.cache/pypoetry/virtualenvs/datashader-env-0ZOB1KZS-py3.8/lib/python3.8/site-packages/datashader/utils.py in __call__(self, head, *rest, **kwargs) 107 typ = type(head) 108 if typ in lk: --> 109 return lk[typ](head, *rest, **kwargs) 110 for cls in getmro(typ)[1:]: 111 if cls in lk: ~/.cache/pypoetry/virtualenvs/datashader-env-0ZOB1KZS-py3.8/lib/python3.8/site-packages/datashader/data_libraries/pandas.py in pandas_pipeline(df, schema, canvas, glyph, summary) 15 @bypixel.pipeline.register(pd.DataFrame) 16 def pandas_pipeline(df, schema, canvas, glyph, summary): ---> 17 return glyph_dispatch(glyph, df, schema, canvas, summary) 18 19 ~/.cache/pypoetry/virtualenvs/datashader-env-0ZOB1KZS-py3.8/lib/python3.8/site-packages/datashader/utils.py in __call__(self, head, *rest, **kwargs) 110 for cls in getmro(typ)[1:]: 111 if cls in lk: --> 112 return lk[cls](head, *rest, **kwargs) 113 raise TypeError("No dispatch for {0} type".format(typ)) 114 ~/.cache/pypoetry/virtualenvs/datashader-env-0ZOB1KZS-py3.8/lib/python3.8/site-packages/datashader/data_libraries/pandas.py in default(glyph, source, schema, canvas, summary, cuda) 43 44 bases = create((height, width)) ---> 45 extend(bases, source, x_st + y_st, x_range + y_range) 46 47 return finalize(bases, ~/.cache/pypoetry/virtualenvs/datashader-env-0ZOB1KZS-py3.8/lib/python3.8/site-packages/datashader/glyphs/line.py in extend(aggs, df, vt, bounds, plot_start) 69 70 # line may be clipped, then mapped to pixels ---> 71 do_extend( 72 sx, tx, sy, ty, xmin, xmax, ymin, ymax, 73 xs, ys, plot_start, *aggs_and_cols ZeroDivisionError: division by zero ```
nmgeek commented 3 years ago

Hi @ppwadhwa,

What is up with this old bug? I started to evaluate datashader for my project but this is a showstopper. Who draws lines with aliasing turned off? The usual line render function applies anti-aliasing, so a mature rendering package would be able to anti-alias, right?

I feel like, in the end, for better performance, I may have to massage my input data to be multiple open polygons instead of a table of line segments. Although there would be significant cost to reformatting the data as polygons. Does the polygons method support such a format? Will it still crash with this divide-by-zero exception?

Here is my test case. You could add it to your test suite:

from collections import OrderedDict as odict
import pandas as pd
import datashader as ds

df = pd.DataFrame(odict([('x0', (1., 2.)),
                         ('x1', (2., 3.)),
                         ('y0', (1., 2.)),
                         ('y1', (2., 3.)),
                         ('z', (10., 20.))]))
cvs = ds.Canvas(plot_width=100,
                plot_height=100,
                x_range=(-4., 4.),
                y_range=(-4., 4.))
imbuf = cvs.line(df,
                 x=('x0', 'x1'),
                 y=('y0', 'y1'),
                 axis=1,  # Draw one line per row in source using data from the specified columns
                 agg=ds.sum('z'),
                 antialias=True)  # requires numba>=0.51.2 and 'sum' or 'max' aggregator
philippjfr commented 3 years ago

@nmgeek This is an open source project with various volunteers having contributed significant amounts of their own time to contribute so the tone of your message is not particularly appreciated. That said we could have triaged this issue earlier, right now what would help the most is a minimal reproducible example of this bug. Anti-aliasing is a relatively recent addition and clearly not all the bugs were ironed out. Datashader has always been focused on delivering the fastest rendering performance and antialiasing is relatively expensive if you're rendering millions or billions of line segments so it was not a priority.

Does the polygons method support such a format?

The polygons method only renders filled polygons, if you want to plot connected line segments I'd look into constructing a spatialpandas dataframe containing MultiLine geometries and passing that to the line method but I'd have to know a bit more about the structure of your data.

Will it still crash with this divide-by-zero exception?

polygons does not implement antialiasing afaik.

jbednar commented 3 years ago

Hi @ppwadhwa,

Hi @nmgeek,

I appreciate that you are frustrated finding this bug not being addressed soon enough for your purposes. I would ask you to please try to approach open-source projects with a healthy respect for the fact that they are volunteer efforts. No one here is being paid to provide a service. Because of this, addressing developers you do not know by name with requests can come across as rude. The fact that @ppwadhwa helped triage this bug and assign it to our next milestone doesn't make her beholden to any particular user.

What is up with this old bug? I started to evaluate datashader for my project but this is a showstopper. Who draws lines with aliasing turned off? The usual line render function applies anti-aliasing, so a mature rendering package would be able to anti-alias, right?

Here it sounds like there may be some confusion about what Datashader does and how it works. I can emphathize; it's a complicated system that's hard to appreciate at a glance. Still, it's good to keep an open mind and some humility when approaching unfamiliar projects, with an awareness that there may be factors you don't yet understand and appreciate.

In this case, the "usual line rendering functions" do not apply to Datashader. Datashader was built to address these priorities, in order:

  1. Faithfully conveying the overall distribution of extremely large datasets by aggregating data points accurately per pixel.
  2. Providing users precise control over such aggregation, e.g. deciding whether to aggregate by count, mean value, min or max, etc., as needed for the decisions they are supporting.
  3. Efficiently using CPU, GPU, and memory resources to make large datasets practical to work with.
  4. Looking nice.

Existing line rendering methods in graphics libraries would satisfy priority 4, but they do not address priorities 1, 2, or 3 at all, so we can't use them in Datashader. Those routines do not aggregate overlapping lines in a way that will convey the density of lines in each part of the display; instead they overplot and obscure previous lines as illustrated in Datashader's Plotting Pitfalls documentation. Datashader instead accurately calculates how many mathematical (infinitesimally wide) lines cross each pixel, incrementing the count per pixel so that the overall distribution of crossing lines is displayed at each point. Antialiasing effectively relaxes these assumptions, letting a line affect not just the pixels it passes through but nearby pixels to some extent, which is not as mathematically well defined but indeed does make things look nicer, so we were recently funded to to try to support smooth lines for the case where there are only a few data points in a given region. But it's actually quite painful to add such a feature to Datashader's architecture given the other priorities that mean we can't simply reuse existing renderers, and we're not quite there yet. Unfortunately, those funds have been used up, so any further fixes are on a volunteer basis. In the meantime, if your use case prioritizes 4 rather than 1, 2, and 3, just use some other tool. And if you share our priorities 1, 2, and 3, feel free to help volunteer to address 4.

Here is my test case. You could add it to your test suite:

For an open source project, the way to phrase this information to avoid coming across as rude would be: "Here is a test case that I think catches this bug. If that's useful for your test suite, please consider including it. I would also be happy to make a PR to add it to the suite, but I could use some help and advice for how to do that."

nmgeek commented 3 years ago

Sorry for coming across as rude. I specifically addressed @ppwadhwa because this issue is unassigned so I assumed anything I wrote would go to the bit bucket. (Apparently someone is watching activity for unassigned issues: it is more than I expected at a busy open source project.) I would not have submitted a test case if I was not trying to be helpful: the person who submitted this did not provide one.

I am not clear on whether you think this issue is important to your general philosophy of what datashader should do, well. For my own purposes, line antialiasing is not important: I care about points. I think aliasing is an important problem for users who need datashader aggregation for very large data sets.

I started playing with an override of Canvas.points. It needs the scaling from the canvas to calculate antialias interpolation. It reaches into the canvas object to get the scaling parameters. (I guess the line antialiasing must do the same: I have not looked.)

If the antialiased results meet my requirements I would love some help with performance optimization: ideally it would map to some primitive in the GPU. (The implementation of points uses advanced Python and numba features which would be easy for me to butcher.) Possibly, it makes sense to support antialiasing interpolation plugins: the same way datashader supports reduction plugins.

Since points has no antialias feature there is no "bug" in points and this discussion should be under a different issue, right?

jbednar commented 3 years ago

Sorry for coming across as rude. I specifically addressed @ppwadhwa because this issue is unassigned so I assumed anything I wrote would go to the bit bucket. (Apparently someone is watching activity for unassigned issues: it is more than I expected at a busy open source project.)

Thanks for the explanation, but that's precisely what is rude about the request! :-/ I.e., it's rude to assume that you have the right to commandeer the attention of some particular developer or contributor just because you don't want your concerns to be on the same heap as everyone else's. You are welcome to make requests of a project (so anyone in the project can step up and satisfy the request if they wish), but not of a person, who has no responsibility to you in particular. The fact is, your concerns are on the same heap as everyone else's, unless you are either paying us or volunteering your own efforts. Assuming otherwise is what's presumptuous!

I am not clear on whether you think this issue is important to your general philosophy of what datashader should do, well. For my own purposes, line antialiasing is not important: I care about points. I think aliasing is an important problem for users who need datashader aggregation for very large data sets.

As the current leader of the Datashader project, I can verify that I do want to avoid distracting visual aliasing, and in fact I was the one to open the issue requesting this in the first place, the one to seek funding to get it implemented, and the one to manage the implementation as it was progressing. However, my personal priority is for lines, not points, because in my own applications point antialiasing only matters for a few scattered points, for which I can always switch to a different tool. Line aliasing, in contrast, is a problem even for billions of points if they are all on the same curve. So I'd want to address open issues with line antialiasing, including this bug and the fact that antialiasing doesn't yet support lines with a specified thickness, before moving on to points.

That said, I'd be very interested to hear what your wishes and requirements for point antialiasing would be. For one thing, addressing aliasing for rendering points with a specified size might also address aliasing in lines with a specified width; the problems may in fact be one and the same. But point antialiasing is definitely a different topic from this particular bug, so please do open a separate issue outlining what happens for your use cases right now and what you'd like to happen, and we can see from there!

jbednar commented 3 years ago

Here is my test case. You could add it to your test suite:

@nmgeek , I haven't been able to reproduce any problem with your test case:

image

Can you list which Datashader and Numba versions you used?

russss commented 2 years ago

For what it's worth, it's still happening with my dataset, after upgrading to numba 0.54.1, datashader 0.13.0.

(I've been achieving similar results using oversampling: render the image with Datashader at 2x or 4x the target size, then downsize it afterwards.)

jbednar commented 2 years ago

Thanks, @russss . We are currently looking into funding for a fuller implementation of line antialiasing, and if that's successful, we'll probably be rewriting most of the existing antialiasing code anyway. Still, to make sure we don't cause a similar problem, it would still be useful to have a reproducible test case.

jbednar commented 2 years ago

Closing this issue since the antialiasing code is all now rewritten to support a specified line width, so whatever bugs may have applied to the previous version are no longer relevant. The new version can be installed with conda install -c pyviz/label/dev datashader, and can be tested by setting linewidth=<somenumberofpixels> in a call to Canvas.line.

bbudescu commented 2 years ago

I still get ZeroDivisionErrors when trying to turn on antialising in 0.14.1

ianthomas23 commented 2 years ago

@bbudescu I fixed a divide by zero bug in the new (0.14.1) antialiased line code in PR #1099. Please could you check if that works for you or if you have found another bug?

bbudescu commented 2 years ago

@ianthomas23, now it works. Thanks!

ianthomas23 commented 2 years ago

@bbudescu Thanks for checking. We will release version 0.14.2 soon as this quite a serious bug.