leeoniya / uPlot

📈 A small, fast chart for time series, lines, areas, ohlc & bars
MIT License
8.51k stars 371 forks source link

valToIdx: PEBKAC or bug? #311

Closed noobymcnoob closed 3 years ago

noobymcnoob commented 3 years ago

This may betray a total misunderstanding of the API but, shouldn't the following always return 0 even if distr = 2?

u.valToIdx(u.data[0][0])

For me, it always seems to return the right-most index (as u.data[0][0] is a datetime and is usually going to be larger than the highest index). I believe there is an interaction with distr = 2 that is causing this, but I'm not sure how to inverse it.

https://github.com/leeoniya/uPlot/blob/master/dist/uPlot.d.ts#L97

Any suggestions on how can I otherwise find the nearest x index through the API/

Thanks.

noobymcnoob commented 3 years ago

I believe this function would need to do something different in the case where xDistr == 2

https://github.com/leeoniya/uPlot/blob/master/src/uPlot.js#L150

leeoniya commented 3 years ago

yes, there may very well be a bug/oversight here since the internal array for distr: 2 is positional and not the same one passed to uPlot.

there are a bunch of places in the code that check for distr:2 and swap which array must be used (as you've discovered). i'm not terribly happy with the whole thing since it leads to other oddities like https://github.com/leeoniya/uPlot/issues/212.

the fix should be simple though by branching in additional places, as you point out.

leeoniya commented 3 years ago

i'm on holiday currently and not near a computer, so it'll be a few days before i can verify and fix or review any PRs.

the correct fix for this might be to adjust

https://github.com/leeoniya/uPlot/blob/master/src/uPlot.js#L1611

with xScaleDistr == 2 ? data0 : data[0]

noobymcnoob commented 3 years ago

Enjoy your vacation, thanks for verifying! You could eventually want to refactor the whole distr thing once it gets too unweildy so for now I think it's probably fine to make it a little more unweildy with the extra branching. ;-)

leeoniya commented 3 years ago

i think this is fixed by the referenced commit (along with a few related api methods!). let me know how this works.

thanks!

code i used to test:

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>u.valToIdx()</title>
        <meta name="viewport" content="width=device-width, initial-scale=1">

        <link rel="stylesheet" href="../dist/uPlot.min.css">
    </head>
    <body>
        <script src="../dist/uPlot.iife.js"></script>
        <script>
            let xs = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30];
            let vals = [-10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7,8,9,10];

            let data = [
                xs.map((t, i) => t * 3),
                xs.map((t, i) => vals[Math.floor(Math.random() * vals.length)]),
                xs.map((t, i) => vals[Math.floor(Math.random() * vals.length)]),
                xs.map((t, i) => vals[Math.floor(Math.random() * vals.length)]),
            ];

            const opts = {
                width: 1920,
                height: 600,
                title: "u.valToIdx()",
                scales: {
                    x: {
                        time: false,
                        distr: 2,
                    },
                },
                series: [
                    {},
                    {
                        stroke: "red",
                        fill: "rgba(255,0,0,0.1)",
                    },
                    {
                        stroke: "green",
                        fill: "rgba(0,255,0,0.1)",
                    },
                    {
                        stroke: "blue",
                        fill: "rgba(0,0,255,0.1)",
                    },
                ],
            };

            let u = new uPlot(opts, data, document.body);

            data[0].forEach((v, i) => {
                console.log({
                    val: v,
                    realIdx: i,
                    valtoIdx: u.valToIdx(v)
                });
            });
        </script>
    </body>
</html>
leeoniya commented 3 years ago

wait, this caused some unintended side-effects

leeoniya commented 3 years ago

looks like i'll need to bite the bullet and fix this holistically together with #212. otherwise it'll become an even more confusing mess.

noobymcnoob commented 3 years ago

Sounds like a normal vacation ;-) Do you still want me to test the latest?

Also, why is distr:2 (from comments in #212) considered a niche case? My incomplete understanding is that I want to use it to avoid gaps in the x index for when there is no data in between dates.

leeoniya commented 3 years ago

after some more pondering, both this issue and #212 stem from the way that uPlot masks the internals of how distr: 2 works - a design that has turned out to be a rather leaky and confusing abstraction over something which would be much more clear (though much more verbose) if done in userland.

the thing to understand for distr: 2 is that u.data[0] is not the data which you've given to uPlot, but simply an array of indices from that data. so if you gave uPlot [36,250,251,900] for x, distr: 2 will turn that into [0,1,2,3] and internally store your original data[0] array simply for display and formatting purposes. meanwhile, all the internal math, scales, axes, path drawing, etc. will use the index array to provide equal/ordinal spacing along x and discontinuous scales (such as "skipping" weekends, etc.). effectively your u.valToIdx(), u.setScale() and all other API functions in distr: 2 mode expect to be operating on that index array rather than your original array (which is now simply labels with no special meaning for the scales or api functions).

if you were to do this in userland, it would be more verbose, but more obvious. distr: 2 is essentially just sugar over:

const labels = [36,250,251,900];

const data = [
  labels.map((v, i) => i),  // x values (ordinal positions, spaced linearly)
  [87,29,75,13],  // y values
];

const opts = {
  axes: [
    {
      incrs: [1,2,5,10,20,50,100,200,500,1000...],
      values: (u, splits) => splits.map(i => labels[i]),
    },
  ],
  series: [
    {
      value: (u, i) => labels[i]
    },
  ],
};

(plus a bit more)...but that's really all that distr:2 does behind your back. so, .valToIdx(251) for an exact value from the original x array would be equivalent to labels.indexOf(251). or if you want to go faster and also handle searches for inexact values, then uPlot's closestIdx():

https://github.com/leeoniya/uPlot/blob/e796aa62cd4129da820aedf8777818d5115fd116/src/utils.js#L15-L35

long story short, i think it's working as designed if you deal/think in x-indices-as-values whenever in distr: 2 mode. that design is unfortunately not friendly/intuitive and therefore seems inconsistent, but it's not immediately clear to me how it can be made consistent and still deal properly with interpolation along discontinuous, evenly-spaced scales (especially time: true) - it's a problem i have not seen solved in a charting lib before, so if you see one, please point me to it.

noobymcnoob commented 3 years ago

I had assumed that distr:2 did something for performance but given your explanation, it probably doesn't. The less sugar uPlot has, the better (IMO).

So I completely agree with you. I might even suggest that distr:2 be removed or become unsupported sometime soon. I'll be working on removing it for my own code. A library of snippets in Github might even be the way to go...

Thanks!

leeoniya commented 3 years ago

I might even suggest that distr:2 be removed or become unsupported sometime soon.

as tempting as that sounds, i think it's super ugly to have distr:2's labels outside of opts or data and not have it tightly integrated with .setData(), etc.

i really would like to figure out a concise way to abstract the implementation details of distr:2 and make it behave according to expectations; there would be too much convenience lost by simply dropping support, imo.

noobymcnoob commented 3 years ago

It seems then that you've got the starts for a more general abstraction above. One that can apply to multiple values of distr. Data -> labels and indexes.

The convenience is fine and appreciated but it seems to me that it is not intrinsic to the library and thinking about it as separate might help. So perhaps when opts encounter a specific distr value, then a specific transformation is executed and from there on distr is meaningless internally, similar to what you've done above.

On Sat., Sep. 26, 2020, 8:33 a.m. Leon Sorokin, notifications@github.com wrote:

I might even suggest that distr:2 be removed or become unsupported sometime soon.

as tempting as that sounds, i think it's super ugly to have distr:2's labels outside of opts and not have it tightly integrated with .setData(), etc.

i really would like to figure out a concise way to abstract the implementation details of distr:2 and make it behave according to expectations; there would be too much convenience lost by simply dropping support, imo.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/leeoniya/uPlot/issues/311#issuecomment-699489767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJDE4VGCKIPSYEWIYJDJMVDSHXNQJANCNFSM4RX2JVCA .

leeoniya commented 3 years ago

i've made a couple incomplete attempts at concealing the internals but basically ended up in a similar place to https://github.com/leeoniya/uPlot/issues/77: it's a lot of extra internal bookkeeping for not a lot of added value. i could not come up with a generic userland wrapper that can gracefully integrate .setData() calls with label updates and does not clobber other customized settings. of course there's nothing stopping you from avoiding distr: 2 and DIYing it via the default distr: 1 (as shown above), though i'm not certain what major benefits that route offers over simply setting distr: 2 and adjusting your API calls accordingly.

for the time being, since the explanation is pretty straightforward [1] and the boilerplate reduction is significant, i think the current distr: 2 implementation is a net-positive compromise. i actually mis-remembered this in a prior comment: in fact i do retain the original data in u.data[0], which makes the labels accessible to various hooks and functions (the actual generated ordinal array is strictly internal).

i'm marking this as a duplicate of #212, since it's essentially the same root issue.

[1] "API methods involving ordinal distr: 2 scales deal in indices-as-values; the raw values provided to .setData() become dumb labels with no semantic meaning and are simply passed through to the axes' and series' value formatters".