leeoniya / uPlot

📈 A small, fast chart for time series, lines, areas, ohlc & bars
MIT License
8.48k stars 370 forks source link

Update README.md exponential -> linear #925

Closed unphased closed 1 month ago

unphased commented 2 months ago

let's say you have 10 datasets with 100k entries each.

this is 1.1M input data points spread out across 11 arrays holding 100k entries each (1 for x, 10 for y's)

if the situation changes from sharing x values to all disjoint x-values, there will be 1M unique x values, and y arrays will be padded with 9x as many nulls as there are actual data. At all times x and y arrays must have the same length.

This results in 11M array values in the data in order to represent this. This is just a linear proportion to the number of independent x-values here not a quadratic or exponential increase.

If we cared about making the representation more efficient here for memory usage, it should not be hard to add a separate convention to put multiple datasets into one graph, that way if you could group the x and y values that are shared together.

In this example, then there would be 10 datasets each with x and y values, making for 20 arrays of 100k elements = 2M scalars. In this case none of the data is redundant, so actually the example above only wastes 11M - 2M = 9M values (all of them inside the Y arrays, which isn't even all that bad.

leeoniya commented 2 months ago

only wastes 11M - 2M = 9M values (all of them inside the Y arrays, which isn't even all that bad.

lol "only". the resulting dataset it mostly waste. but yes, quadratic is probably more accurate than exponential, though "polynomial" might be more accurate than quadratic.

since this was written, uPlot has attained a second mode, called mode: 2 which does not require joining, and all existing paths support it. the data structure like this

let data = [
  null,
  [
    [0,1,2], // x1
    [5,0,5], // y1
  ],
  [
    [7,8,9], // x2
    [3,3,3], // y2
  ],
];

and the setup for the series is with series.facets:

https://github.com/leeoniya/uPlot/blob/5756e3e9b91270b303157e14bd0174311047d983/demos/scatter.html#L560-L595

unphased commented 2 months ago

Thank you for explaining mode 2. I was looking around and found some hints about what the modes are but haven't really made heads or tails of it yet. This will be helpful to know for when i get to leveraging this lib more deeply.

Yeah I mean all i was saying really was "exponential" just makes readers (like me) assume the worst, e.g. if you had 500 x values or something that would mean the page won't load until beyond the heat death of the universe or whatever. I'm just saying it aint that bad haha.

leeoniya commented 2 months ago

mode 2 will be the default and only mode in uPlot v2, but it wont have the wierdness of series[0] and data[0] being null. that exists today to keep the internals consistent with mode 1 where all the "y" series are at indices 1+.

unphased commented 2 months ago

great.

To use it now... do i have to specify mode: 2 in the opts or is it implicitly based on the shape of the data?

leeoniya commented 2 months ago

you have to specify opts.mode: 2.

mode 2 was originally meant for scatter type datasets with no requirement for x ordering and where each series can have arbitrary x/y points as well as arrays for extra dimensions, like size, color, etc.

because of this, there are a few things that mode 2 changes in terms of behavior.

since we no longer assume an ordered, aligned, single-x dataset, it's left up to the user to implement hover behavior. e.g. the scatter demo uses a quadtree, though i plan to change this to a gutted version of Flatbush.js (a packed hilbert r tree). in v2 this spatial index will be integrated into uPlot so it wont be necessary to completely diy this.

the default zoom behavior in mode 2 is both x and y (rectangular selection). there isnt really y-auto-scaling in this mode since there's nothing special about x in scatter plots. this will have to change in v2 to be explicitly specified by the user and i'll add support for same x-zoom behavior as current mode 1.

unphased commented 2 months ago

Sweet. Lack of cursor collision in that paradigm makes sense, this really highlights your dedication to efficiency with this lib and I appreciate that a lot.

I'll also ask, do you imagine you could make a timeline plot with rectangles (like the first one on this demo page) more of a built in capability with v2? Having to draw the rectangles ourselves is a bit painful. I think given how efficient this lib is, it is a great starting point to efficiently display large program traces for visualizing and making ad hoc tooling.

leeoniya commented 2 months ago

I'll also ask, do you imagine you could make a timeline plot with rectangles (like the first one on this demo page) more of a built in capability with v2?

ive been thinking about how to make this better. the integrated spatial index will help. but also something like a floating bars pathbuilder, as well as a scale type that uses the visible series count in either the x or y direction and does the justified layout (which will also help the grouped bars case). i want the core to provide the building blocks rather than full solutions for all viz types, which can just be done in a wrapper lib with reduced / higher level option set.

unphased commented 1 month ago

I'll just note that I've been very happy with my initial integration of uplot into my unit test framework (i havent done much work on the readme... sooon...). It's really more of a general test and benchmarking framework now since I am leaning into features for process management and generation of plots.

My experience so far after having come up with 5 or so use cases for plotting:

I'm eager to know if uplot v2 is approaching readiness. Is development happening in a branch? Do you have any demo pages similar to the giant demo listing you've got for v1?

I'm trying to decide whether to start building integration targeting v1 mode 2 or just jump to v2. Probably the latter is practical because tbh the x-centric datamodel isnt even really limiting me much at all today.