Open krassowski opened 2 years ago
friendly ping to @martinRenou and @ibdafna
Thanks for the ping.
@krassowski do you need text wrapping (line return for long text in cells) in your case? I see there is one measureText
call that we could probably get rid of if you don't need text wrapping.
In the case of text wrapping enabled, we can perhaps come up with a better algorithm that uses less measureText
calls.
We could also have some tricks to work around measureText
slowness in the case of mono fonts.
Thanks for reporting! I'd argue this is an extreme case when each string has a length of 10240 characters - not justifying the performance, but just putting it out there. One simple fix would be to put a cap on the maximum string length we apply wrapping to and just disable the functionality if it's longer than the max.
EDIT: I don't think there's text wrapping enabled in this case - just eliding (which also measures the text).
We measure the text length in all cases - eliding or wrapping. textWrapping is optional, but eliding isn't. We should probably add a boolean for eliding, and if both wrapping and eliding are set to false then no text measurements calls need to be made. That's a useful but not sufficient approach. We can also apply some caching for {string:length}, although if n m strings are all unique, we would end up with n m entries which is not good for memory footprint. https://github.com/jupyterlab/lumino/blob/15fb5d14136b9d465d744c3373bd347fe1f40bf0/packages/datagrid/src/textrenderer.ts#L217
I see there is one measureText call that we could probably get rid of if you don't need text wrapping.
Indeed, this one. No maybe not? Sorry I don't see it anymore!
EDIT: I don't think there's text wrapping enabled in this case - just eliding (which also measures the text).
I agree, this looks like an issue in eliding, but when I skimmed in the code I saw that wrapping is implemented with naive linear search but could use binary search to reduce the cost to log(n):
We should look to improve the wrapping algorithm for sure, but I guess my point here is that we should eliminate or significantly speed up the measureText
call which happens regardless or wrapping/eliding
Not sure how much we can speed up measureText
as it is pretty close to the metal.
Eliding a sentence with 1024 characters to a 1-character size column with current binary search takes 10 measureText
calls. We could reduce it to 1 call (on the happy path) with:
// If dealing with a very large string
if (textWidth > 4 * boxWidth) {
const approximateLengthToKeep = Math.ceil(text.length * boxWidth/textWidth);
// it is probably wise to add a small margin (increasing the number of calls on the happy path to two)
// in order to reduce the chance of going into worst case scenario of the old algorithm.
const margin = 1;
const candidateText = text.substring(0, approximateLengthToKeep + margin) + elide;
const candidateWidth = gc.measureText().width;
// Keep the result if we did not overshoot because of unfavourable string composition;
// the old algorithm will fine-tune it to remove the few remaining extra characters
if (newWidth >= boxWidth) {
text = candidateText;
textWidth = newWidth;
} else {
// no-op
// we would end up here for strings like: `iiiiimmmmmm` if we are unlucky.
// Of course we could try to continue with a new best guess, but we should bail
// after a few trials to avoid particularly nasty strings (worst case scenario is
// alternating between overshoot and undershoot).
}
}
// keep old algorithm below as it was
while (textWidth > boxWidth && text.length > 1) {
if (text.length > 4 && textWidth >= 2 * boxWidth) {
// If text width is substantially bigger, take half the string
text = text.substring(0, text.length / 2 + 1) + elide;
} else {
// Otherwise incrementally remove the last character
text = text.substring(0, text.length - 2) + elide;
}
textWidth = gc.measureText(text).width;
}
}
The bigger issue is a wide column with the text, say equivalent to 200 characters (this is what I saw locally). Then if you start with 399 characters you would call measureText
199 times. This looks like a realistic scenario for many users: you add a column with one paragraph but want to display only the size of about one sentence.
The bigger issue is a wide column with the text, say equivalent to 200 characters (this is what I saw locally). Then if you start with 399 characters you would call measureText 199 times. This looks like a realistic scenario for many users: you add a column with one paragraph but want to display only the size of about one sentence.
Example in details:
measureText
can be sped up via a shim which does some caching or other internal accounting to reduce the CPU time. Just to iterate again, the issue raised above is separate from wrapText, which is another topic. The performance bottleneck you are experiencing above is not caused by the wrapText algorithm. We should look at fixing this issue first, before moving to wrapText. wrapText is disabled by default for now.
Yes I agree that the issue I saw is not in wrapText
(I never said it was).
As for caching, it might help a bit with scrolling (at cost of doubling the RAM if not much worse) but as soon as user resizes the column we are back at square one.
Just to spell it out, I was referring to eliding in last few comments and eliding is enabled by default and inefficient in the worst case scenario (as described above).
Okay, understood 😊 I think I have some ideas about how to fix this. Shouldn't be anything difficult. Will report back!
@krassowski @fcollonval I've done more profiling to see what the performance is like without eliding or wrapping, and unfortunately this is still quite laggy even with zero calls to measureText
. The paint
call seems to take a long time. I am wondering why we're calling paint so much - we should be able to blit the content.
This is for resize, but it's even worse for scrolling.
Digging deeper, I think this may have to do with our rendering optimizations. When resizing column/row headers or scrolling, we trigger a full on-screen buffer paint - this is extremely slow. We should be writing to the off-screen buffer and copying any changes from that buffer to the on-screen one.
Description
Rendering, and scrolling within DataGrid is very lagy when large text is present in cells. This is due to
CanvasRenderingContext2D.measureText
invocations.Reproduce
https://github.com/jupyterlab/jupyterlab-plugin-playground with:
This is a bit pathological, but I got worse with just one column of long strings (CSS content of lab one sheet per row)
Expected behavior
DataGrid loads fast (<2s for dataset of this size) and scrolls smoothly. For scroll, the text size is cached or something.
Context
Only tested with lumino 1.x.
Firefox:
Chrome:
Chrome, zoom-in: