Closed HoraceShmorace closed 8 years ago
Order of object keys are not defined, so there is no consistent way to loop through them, unless you do a sort
on them.
The
Object.keys()
method returns an array of a given object's own enumerable properties, in the same order as that provided by afor...in
loop (the difference being that a for-in loop enumerates properties in the prototype chain as well).The
for...in
statement iterates over the enumerable properties of an object, in arbitrary order. For each distinct property, statements can be executed.
Also you can directly address any element of an array.
data[798].foo = 'baz'
grid.datum(data)
There doesn't need to be a 'consistent way' to loop through object keys. Just do it in the order they appear, top to bottom.
And addressing an array via index is not realistic for a real world implementation. Currently, we get a collection of items (rows), and then we get updates for each individual row every 10 seconds. So if we have a 1000 items, every 10 seconds we're looping through them 1000 times.
Or get the keys via Object.keys
, and sort them.
Wait, I'm too dumb to get this – how is getting array items via index any different from getting object properties via key? They are the same thing, no?
I'll point out that other grids do this. This would make any real-time application with a large dataset significantly more performant.
@mstade - Because if you're getting real-time data updates for individual rows, you don't know the index of the row.
@HoraceShmorace I don't see why setting the datum as object would be faster from the grid's perspective. It would still have to build an array to loop through it.
Oh wait I think I understand you better now @HoraceShmorace. Let me see if I can break it down.
You have a reference to the row itself – by reference I presume not an actual reference to the row object, but an id of some kind – but don't know its index in the dataset? So what you have to do then is to loop through the array, find the needle by checking id values, update the row when you find it, then render the grid again?
Whereas if you had the dataset being a map instead, you could just index by id directly, thereby not needing the loop to find the row in question?
@Poetro - The performance gain in implementation for real-time data applications would significantly outweigh the performance lost to the grid doing that conversion.
Again, imagine thousands of rows, all individually updating every 5-10 seconds. It's not even realistic to do with an array. The looping required is incredibly wasteful.
Oh yeah for sure. Let's make an example of this, it'll make a nice test case. A question, are there insertions and deletions as well, or just updates?
Yeah, all three.
Edit: Actually, not really. Here's the flow (it's a securities trading application):
data
variable.grid.datum(data)
.data
array, and subscribe to real-time updates for each individual security, every 10 seconds (or less).data
array to find the affected row (by a security id
).grid.datum(data)
again.Looping through thousands of rows to find one and update it is much faster, then rendering the grid itself. And you can keep references to rows in a Map
by id.
var map = new Map()
data.forEach(function (datum) { map.set(datum.id, datum) })
// ...
map.get('row799').row = 'baz'
grid.datum(data)
@HoraceShmorace thanks for that detail. I've got an example running right now with 1000 rows, each individual row updated every (1000 + 100 * Math.random())
ms with some random data. It's pretty snappy to be honest. I'll see if I can make it work in a block and show you, it may very well be a bad example so it'd be good to get your feedback.
@mstade - It's not only about performance (and sure, a stand-alone example might be snappy outside of a larger application), but also about implementation complexity.
No I get that @HoraceShmorace, but I want to get the example up so we have something to work with. If we were to implement support for objects as datasets, we'd have a concrete use case to test it against.
Alright, apologies for the delay. Here's the demo I mentioned, where the grid is populated with a 1000 rows that are all individually updated on a separate timer. Hit the start updates button to kick it into gear once loaded. For every tick, there's a loop to find the row assigned to that tick, update it with some random data in the username column, render the grid and duck out of the loop. For good measure the dataset is shuffled, although I doubt this makes much of a difference.
Granted this is a contrived example, but it fares much better than I thought it would and the code is hardly complex. It looks like it stutters a fair bit, but I think that's more the perceived performance than actual performance – shortening the interval duration makes the stutter effect go away. Scrolling the grid while it's updating is pretty good too at least on my machine, better so in Safari than Chrome but both are pretty good. I expected severe jank but can't say I'm experiencing that – milage may certainly vary. I did not test in Firefox.
Now I guess the million dollar question: is this a representative example of your use case? I think so by reading your explanation. Note however that I only ever do grid.call(table)
since the dataset doesn't grow or shrink. I did a quick test with grid.datum(rows).call(table)
as well and it seemed to be equally snappy, but I may be lying to make friends. :o)
I'm sure there's plenty of room for performance improvements across the board in this component, but I've got to say this isn't half bad. Upping the rows to 5000 definitely starts to add some very noticeable lag, but still workable as a naive solution. I tried removing the render call from the row update loops and instead have it just run naively every 1000ms – surprisingly this seemed to have quite little effect on at least perceived performance. I also took a quick gander at the memory profile and saw a nice sawtooth pattern all the way, no obvious leaks at least.
A penny for your thoughts @HoraceShmorace. (I'm cheap!)
(For bonus fun: sort by the username column.)
Hi, thanks for the suggestion. I understand that you'd like to keep an index of your rows, and that's perfectly cool -- makes a lot of sense for your use case. But I don't think we should implement this at the grid level, because the first thing we'll do on the grid side is to explode the object into rows and forget about your map.
The grid itself doesn't need the rows to be constant or even indexed or unique. Many times the grid is used where completely fresh data sets that might even "stand" for the data that was already there -- merging and keeping the data client being only one use case -- so object references would be meaningless then. The only "id" the grid cares about is the index in the array you pass it.
I'd suggest, as has been suggested above, that you implement a simple layer for your model, perhaps driven by something as simple as _.indexBy
, something along these lines:
I know this would be not hard to implement on the grid side, but it would stretch the API -- and will make it less D3-idiomatic and more confusing (as it'd somehow hint that row ids are "understood" by the grid -- which they are not). It will not make the grid itself more performant even though it'd make it simpler for you in your use case to consume.
Another thing to keep in mind is that the grid does care very much about row order, and even though we "kind of know" rows on certain browsers will be expanded in the same order they were added, as @Poetro mentioned above, the order is not defined and we shouldn't trust it.
Thanks for that @gabrielmontagne. Looking at my demo which is admittedly (and purposefully) quite naive – can you you see any obvious ways to improve performance or the general pattern? Not so much in terms of loop or no loop, but things like, minimize calls to the table
function in this example, or other sort of generic performance considerations like that?
Hi guys. We implemented what @gabrielmontagne was writing about before.
The implementation is in this block, and it provides direct modification via id
on change without looping, as requested. Thanks to the fact that we're modifying references, we don't need to provide an object, just index the array before creating the table.
@mstade , regarding your question about your rig, we've exposed a couple of optimization hooks for row updates and we've used your rig to test them. Please see the live block here https://bl.ocks.org/gabrielmontagne/d9bca200af7ee7330ed82fa804551565 and the proposed PR #22.
I'd argue that there's real value to the consumers of this component to take a more declarative approach to implementation than this highly imperative approach. It's trivial to accept an object as the argument, even if it comes with the caveat that row order can't be guaranteed, so the implementer should then explicitly sort the grid themselves (this actually seems like a best practice either way).
I'd also argue that Map
isn't well enough supported, and "indexing" a JavaScript array is not actually a normal thing, because that's precisely what objects are for.
However, for now we'll just convert our data object to an array at the last minute.
@HoraceShmorace did you check the _.indexBy
demo? It's pretty clean and definitely less mechanical than the demo I did. The update code is just what you asked for and the prep-work is trivial. Even if you don't use underscore, the prep work is really just:
var rowById = indexBy(rows, 'id')
function indexBy(rows, key) {
return rows.reduce(function(map, row) {
map[row[key]] = row
return map
}, {})
}
The only thing you have to be careful to do is that if you add, remove, or replace a row, the corresponding change has to also happen in the dataset. The naive approach is to just regenerate the rowById
map after having changed the dataset, I doubt it'd be a severe performance hit if it's done as rarely as you say.
I'm keen to agree with @gabrielmontagne's stance on this and say this isn't really something that should be implemented in the grid. I'll be closing this issue now. Happy to reconsider if you push a PR with the changes you envision. A lot of good commentary and follow-on work came out of this thread, so thank you very much for starting it!
When working with large datasets, it would be more performant and convenient if
grid.datum()
could also accept an object, not just an array. This would allow for updating of specific datapoints in specific rows without having to loop through the array to find the correct row (evenArray.find
andArray.filter
are just looping behind the scenes).So in addition to this usage:
... we could also do:
That way, rather than having to loop through to find the row we want to update, we could directly address it.