leeoniya / uPlot

📈 A small, fast chart for time series, lines, areas, ohlc & bars
MIT License
8.51k stars 371 forks source link

Rounding in draw functions #241

Closed EmiPhil closed 3 years ago

EmiPhil commented 3 years ago

Hey,

I'm just wondering why many of the getPos functions in the drawing part of uPlot use round for their values? I'm working on adding real time to my chart and the rounding seems to make the data jump a lot as it hits the rounding boundaries.

With rounding:

uPlotWithRounding

Without rounding:

uPlotWithNoRounding

EmiPhil commented 3 years ago

I think my question is less "why round" and more "is removing the rounding the only way to get rid of the choppiness or am I missing something"

leeoniya commented 3 years ago

there are a few reasons for rounding.

  1. canvas pixel alignment produces crisp vertical lines [1] with high density datasets and reduces the need for the browser to anti-alias across sub-pixels, which increases performance in cases of many data-dense series. it's why the perf bench looks so sharp, including the grid.
  2. data decimation [2] in the draw loop basically requires that you round, which serves as the method of downsampling, otherwise no two points squished into the same pixel would ever return the same .valToPos().
  3. probably something else i'm not thinking of where floating point errors cause unnecessary issues, such as too much pointless invalidation when the math ends up potentially comparing 4 to 4.000000000000001.

the downside of this is two-fold. you get that caterpillar effect in live charts and you get more pixelated series lines as a result of decimation vs something that used floating point coords and does 10x more naive lineTo commands.

it's a trade-off that i don't think is avoidable. if you don't round you'll lose static chart crispness and give up a bunch of perf, but from your gifs you probably don't have nearly enough data to feel it. i'd be open to adding an opt that swaps all internal rounding funcs for identity funcs at chart init. i dont think it should cause too many issues, though alignment of cursor & hover points with the canvas will likely be off. i dont really know how deep the rabbit hole goes or if it's worth it.

[1] https://usefulangle.com/post/17/html5-canvas-drawing-1px-crisp-straight-lines [2] https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z0000019YLKSA2&l=en-US

leeoniya commented 3 years ago

maybe always rounding via Math.floor or Math.ceil would reduce the effect vs Math.round, but that doesnt seem likely :\

EmiPhil commented 3 years ago

Yeah .ceil and .floor was my first (albeit lazy/quick) attempt and I didn't notice a difference.

Obviously this is just my use case but when the user pans around for our charts we are pre-downsampling the data server side so that there should never be more than chart pixels data loaded at once so for that use case the identity function option would be perfect.

As for how deep, yeah I had to remove the rounding from nearly everything. I think that I might give the identity function a shot after we close up #216

leeoniya commented 3 years ago

Obviously this is just my use case but when the user pans around for our charts we are pre-downsampling the data server side so that there should never be more than chart pixels data loaded at once

this is really the ideal case but i've never seen it done since this means the downsampling groups count has to be dictated by the client's chart width & pixel density which will vary by device & screen or window dims.

you actually need up to 4x the pixels worth of data: timestamp, entry, min, max, exit for each pixel. thats what the decimation process effectively accumulates and draws.

i can see this being really tough to get exactly right between server-side aggregation and client-side lineTo commands.

EmiPhil commented 3 years ago

The way we do it is a bit simpler. Since the intention when you are zoomed out enough to need aggregation is to show the high/low of the time bucket we just follow an algorithm that goes:

  1. available plotting points / 2 = bucket count ; time length / bucket count = time interval
  2. for each bucket find the min and max
  3. order the bucket min and max by their real timestamps, giving the first a timestamp of the left edge of the bucket and the second a timestamp of the middle of the bucket

Then on display if the data is from a bucket we round off the time accuracy displayed on the legend to reflect that these are just approximations of the data. If you want to see more, zoom in more (eg don't show the seconds on the legend for a bucket that is larger than seconds accurate).

It makes it pretty easy to keep the server/client in sync because on pan we just send the size of the chart + the left/right timestamps. I guess the difference is that rather then try to squeeze the max/min into one pixel we use two pixels (and no attempt is made at decimation cause it isn't required with this downsampling method)

EmiPhil commented 3 years ago

Here's how that looks in practice:

image

I haven't gotten around to downsampling the real time values on the brush yet so you can really tell the difference between the buckets (which do the wavy pattern because of the min/max strat) and the real time where the decimation is happening.

But there is enough detail there that if a client wanted to know more about what was going on during the day they would know to maybe zoom in on that big peak at 12:30

EmiPhil commented 3 years ago

(That data is a at a 1 second sample rate)

leeoniya commented 3 years ago

hehe, poor man's downsampling - simple but effective :D

leeoniya commented 3 years ago

you know, i haven't thought of this until just now, but actually the other style of OHLC display (non-candlestick) is exactly what the decimation thing does. it may even be a great way to bucket and display this type of data rather than using a trendline.

image

most of the code for this is already here: https://github.com/leeoniya/uPlot/blob/master/demos/candlestick-ohlc.html, and would need just minor tweaks to swap the candlesticks for the in/min/max/out markers...

EmiPhil commented 3 years ago

Huh interesting.

This is getting off topic of my original goal of squashing caterpillars but I was thinking about it during dinner and I think returning the 4 points per pixel like you suggested wouldn't be such a big deal for our use case.

We already have the logic for min max, we separately have the logic for first and last values of a bucket, bam that's all we need. Even gets rid of the pita that sorting the min/max is. Added benefit of doubling the granularity of the data. I'll have to bring this up as a possibility for us so thanks for the idea!

I'm definitely interested in all of these stacked data set things, but my knowledge of the charting world is still pretty basic. After 216 I'll jump back to some of the other issues to get a better idea of what you've been thinking about for this stuff. Especially somewhere you mentioned the problem of many data sets overlapping with the same value - that's gonna be something I have to deal with for my production chart.

GlennMatthys commented 3 years ago

Same boat here with live data. Is there already a way to specify the rounding behaviour?

leeoniya commented 3 years ago

Is there already a way to specify the rounding behaviour?

no, but if you'd like to try adding a opts.pxSnap: 1 setting, i can help guide you to a mergeable PR. it should be pretty easy since all the rounding is done at the last stage during canvas drawing in the pathBuilders or the axis & grid builder function.

GlennMatthys commented 3 years ago

Is there already a way to specify the rounding behaviour?

no, but if you'd like to try adding a opts.pxSnap: 1 setting, i can help guide you to a mergeable PR. it should be pretty easy since all the rounding is done at the last stage during canvas drawing in the pathBuilders or the axis & grid builder function.

Why a value of 1 and not boolean true/false?

@EmiPhil can you point out all the places where you removed the rounding?

leeoniya commented 3 years ago

Why a value of 1 and not boolean true/false?

i think you'd still get a lot of the downsampling benefits by rounding to 0.5px or 0.2px while reducing the jitter.

GlennMatthys commented 3 years ago

Why a value of 1 and not boolean true/false?

i think you'd still get a lot of the downsampling benefits by rounding to 0.5px or 0.2px while reducing the jitter.

Is that how it was done in the screenshots from EmiPhil? I was thinking of removing the rounding alltogether.

leeoniya commented 3 years ago

opts.pxSnap: 0 would disable it. you asked why not a boolean.

leeoniya commented 3 years ago

actually, there's already an opts.pxAlign: true/false which has a related purpose. it would be better to reuse that. true = 1, false = 0 maps well, too.

leeoniya commented 3 years ago

this should now be configurable via opts.pxAlign and series.pxAlign. if you have <= 1 datapoint per pixel, you can set these to 0 without any perf impact. if you have > 1 datapoint per pixel, i would suggest setting this to the largest non-0 decimal, that still adequately reduces the caterpillar effect (such as 0.5, 0.2).

let me know how it works out. i haven't tested it extensively, but it appears to function as intended.