yotamberk / timeline-plus

Timeline - chronological visualization of your data
MIT License
137 stars 36 forks source link

Fix initially hidden large items spanning the whole visible range (extended by 25% to the … #157

Open sbusch opened 5 years ago

sbusch commented 5 years ago

Large items spanning the whole visible range (extended by 25% to the left and right as a "buffer zone") are initially hidden until their start or end scrolls into that range.

This commit fixes this by increasing the buffer zone depending on the items, so that its large enough to fit all potentially visible (not all!) items.

The current implementation has a performance-related drawback:

As the whole "scanning range" is enlarged, more items need to be processed and more items will be visible. I see no problems for light datasets, but for large datasets with many items - especially item types without end date - this could become a problem. Possible solutions:

  1. add an option to choose: have this fix with the performance drawback, or tolerate the bug but with higher performance for large, mixed datasets
  2. two scanning passes:
    • first pass using the extended buffer zone (as implemented in this patch) just for ranged items (items with start and end)
    • second pass with fixed buffer zone (as before) for items without end
  3. more than two scanning passes: similar to 2., but items with start and end could be depending on their duration grouped into scanning passes with different scanning ranges (different buffer zones). Fixed buffer zone (as before) for items without end.

This needs to be discussed. @yotamberk, @taavi-halkola, others: you opinion?

The implementation should additionally be optimised by caching maximum item duration per group:

Maximum duration is calculated per group at rendering level, which happens more often than needed. Possible solution: calculate on item changes and cache it. Should be easy to fix for someone who knows the location where this could be implemented without missing any changes.

This should be implemented. @yotamberk Do you know a suitable location?

I think the initial buffer zone size should also be configurable, currently at 1/2 of visible range (1/4 to the left, 1/4 to the right).

@yotamberk Are you OK with this?

taavi-halkola commented 5 years ago

I think anything affecting performance negatively should be thought through extremely carefully. Atleast proper performance comparisons should be made..

Without actually having time to currently test this, I would think this implementations performance drawbacks would increase together with group amounts also and not only with the amount of items.. simply correct me if this assumptipn is wrong.

I was wondering if instead of growing the scannable zone it was possible to implement two new properties for range items. These would be visible_end and visible_start. These could be calculated when needed from the currently scannable range when initializing and updating items and timelines range changes. Range changes do happen alot and this would decrease their performance(would these even be needed after the initial draw?). These properties would either be calculated to all range items or only those with overflowing ranges and be used instead of their real start and end dates when finding visible items during draws.

This is a very raw idea and propably would need changes in multiple places. Thoughts?

taavi-halkola commented 5 years ago

Managed to get some time to check this.

Turns up we might not need any of the things mentioned above. The real problem is the binarySearchCustom and _traceVisible functions. Does anyone know why visibleItem lookups are done with these? Lack of proper filter functions in the past?

The proposed commit by @sbusch works well and probably has no huge perf drawbacks as the interval is calculated from the highest overlaps. (Any reason we couldn't start with even a smaller default interval if this gets merged?)

The loop of orderedItems.byEnd in the beginning of every call to _updateItemsInRange() might also be unnecessary as the problem only seems to exist when doing the call to util.binarySearchCustom(orderedItems.byEnd, searchFunction, 'data', 'end');.

binarySearchCustom returns an index where the _traceVisible starts scanning for visible items. The byEnd causes the overlapping items with late end dates to be ordered after items which correctly should be hidden. The first hidden item breaks the traceVisible thus leaving the overlapping ranges hidden.

Is there a reason we can't set visible items with a simple filter?

visibleItems = orderedItems.byEnd.filter(item => (item.data.start < upperBound) && (item.data.end > lowerBound))

instead of:

// we do a binary search for the items that have defined end times.
const initialPosByEnd = util.binarySearchCustom(orderedItems.byEnd, searchFunction, 'data','end');

// trace the visible items from the inital start pos both ways until an invisible item is found, we only look at the end values.
this._traceVisible(initialPosByEnd, orderedItems.byEnd, visibleItems, visibleItemsLookup, item => item.data.end < lowerBound || item.data.end > upperBound);

I didn't have much time to test this but the filter as above seemed to return the same items plus the long overlapping items. In my environment the filter also performs better.

Thoughts?

taavi-halkola commented 5 years ago

Forgot to mention that handling merging the visible items into to current set instead of assigning them directly as in my previous comment would still need to be handled. Also visibleitemlookup needs to be handled.

sbusch commented 5 years ago

Currently no time for this, but I'll try to post some comments during the day. Just want to give quick feedback to @taavi-halkola:

It took some time for me to understand the binarySearchCustom / _traceVisible logic. At first I had the same thoughts as you, but it really makes sense for large datasets: the binary search finds some date between the visible bounds, then two calls to _traceVisible traverse from that point in both directions until it crosses the boundary. All items found are considered visible. When crossing the boundary, the algorithm stops with a break and no more CPU cycles are wasted.

While it's OK there are IMO better optimisations. 3D engines solve that for millions of points in the 3D space, we have the same problem in one dimension. While I'm no expert, Timeline could eventually use the one-dimensional equivalent of a Quadtree, I think this is called a Segment tree. It has the tradeoff of requiring memory for the index (the current implementation has only an index of sorted start and end times). Further optimisations e.g. for incremental changes of the visible range could be applied on top of that.

BUT the current implementation is better than nothing, and until someone has the time to implement such a new logic, it should remain.

taavi-halkola commented 5 years ago

Yeah binarySearchCustoms and_traceVisibles fucntionalities are quite clear and i can see how it might save cpu cycles in big datasets. Though, wouldn't it only benefit of those with big datasets per group and when lots of those items are outside the visible range?

The more the items are split into groups and outside the visible scope the less it benefits from the binary search. Also the current implementation causes the need to do the binarySearchCustom / _traceVisible pair twice; first to the items ordered by start and then ordered by end. This in many cases causes more work to be done. By no means am i saying a simple filter against all items would be the best/most performant way to handle this, but by looking at the current implementation it sure would fix the problem of overflowing ranges and be much more readable.

Performance wise it would probably be slower on mentioned big datasets which holds big portion of the items outside of the visible range. Also the suggested commit is adding another loop to the ranged items leaving us with 5 loops to potentially the same set of items and moving the traceVisible breakup point further to both directions(Does this also cause every item within the range of big overflowing items be drawn to dom?). There is simply a lot factors to take into account when considering the performance of possible implementations. Maybe someone would have time to do some comparisons on different scenarios.

taavi-halkola commented 5 years ago

One more suggestion to comment about:

It should be possible to do all of this (showing all types of items and including range overflows) within a single loop:

 for (let i = 0; i < itemCount; i++) {
    if (orderedItems.byStart[i].data.start > upperBound) {
      break;
    }

    if (visibleItemsLookup[orderedItems.byStart[i].data.id] === undefined) {
      let itemEnd = orderedItems.byStart[i].data.end === undefined ? orderedItems.byStart[i].data.start : orderedItems.byStart[i].data.end

      if (orderedItems.byStart[i].data.start < upperBound && itemEnd > lowerBound) {
        visibleItems.push(orderedItems.byStart[i])
      } 
    }
  }

This way there would be no need to go through the byEnd array at all. The loop breaks out when first item with a start date later than range end is encountered. Basically we get a part of binarySearchCustom / _traceVisible pairs breakCondition that way.

It is only possible to break the loop from one end or otherwise the overflowing RangeItems are lost. This should eliminate the need of both binarySearchCustom / _traceVisible pairs and also the checkRangedItems == true block.

jarnovanrhijn commented 5 years ago

@sbusch, @taavi-halkola & @yotamberk. any chance of fixing this any time soon??