Closed Tyriar closed 1 year ago
The core listener for text editors to react on content changes is:
These drive a ton of things on top such as:
onDidChangeDirty
and friends)I wonder how an async emitter would help here: yes, it would take away lag from the first character typing when the editor transitions into being dirty, but eventually we have to pay the price, so the lag would just happen later? Or is the idea to delay the event literally on idle time?
@bpasero oh I missed the question there. The idea is to delay it until shortly after via setTimeout, so the text change should appear asap and the dirty indicator (as an example) would appear 1 or 2 frames later. ie:
Current:
Desired:
There's some nuance here in what should be handled in the keypress task. For example currently the suggest widget is moved and updated in the keypress task. What we probably want is for the suggest widget to move in the keypress task but defer updating as it can be very expensive. Things like this we'll need to experiment with to see if splitting it up ends up with a worse UX and should be on the critical path.
We may be able to optimize the list's splice
method as well to help here, haven't looked at the impl yet but it seems to do a lot of work and also affects search performance/ui responsiveness I saw last week.
Realized one of my laptops has a CPU similar to the average users (though a better GPU). Here's a screenshot of typing this
into TerminalInstance.ctor
which validates my assumptions that latency is pretty terrible on lower hardware (up to 100ms in this case):
This is with a i7-8750H @ 2.2 GHz, didn't turn off turbo boost which can push it up to 4.10 GHz. Not entirely sure how that works but I think I can disable it in BIOS if needed.
Though I haven't tested thoroughly on the laptop, I'm quite surprised that it seems to actually perform much worse than 4x CPU throttle on my primary machine (i7-12700KF @ 3.61 GHz, boost 5.00 GHz). I was expecting it to be the other way around.
I drilled into a profile on my macbook to understand some parts a little more. Here are the details
TLDR: An enormous amount of work seems to be spent just scheduling things, Event.defer
will probably be an easy solution to those.
15.43ms (51% of critical path)
Legend:
:bulb: We can probably improve or defer this fairly easily :question: We can maybe improve this, needs more investigation/validation
_readVisibleRangesFromCodeEditor
which calls codeEditor.getVisibleRanges() is the most expensive thing hereThe methodology, using the "Performance" tab was flawed, it exagerates timeouts (even when async stacks are disabled) and the overhead slows it by around 4x. A profile in the JavaScript Profiler tab is much more accurate and you can also perform the operation many times and then look as the slow parts in terms of the percentage of time they take up. By looking at the percentage instead of actual milliseconds it's much more reliable to get a good sample as it consolidates all calls.
For example I recorded typing many bunch of characters, in top down you can now see a more accurate view of the function. This fn
is the keypress handler and it shows it took up 17.85% of the CPU time:
Now focusing on that function shows the breakdown of the call stack as a percentage of the total fn
time. One of the suspected functions contributing to the slow down is the bracket pair parsing which you can see takes up 1% of the time after cheap tokenization:
And 8.39% of the time after the model content changes:
Looking into this some more now.
Here's a breakdown of the CPU profile when typing a bunch of characters (~50, random), the suggest widget didn't show most of the time. This tree isn't complete, a lot is omitted here to remove noise vs just looking at the profile in devtools. The children of a node are not ordered and not necessarily directly below the parent or siblings of other children.
❌ = Too risky to touch
This is a living document for now:
Profile when typing .
after this and backspace repeatedly to measure high-level impact of suggest widget.
_type
- codeEditorWidget.ts
checkTriggerCharacter
suggestModel.ts _insertSuggestion
- suggestWidget.tsinsert
- snippetController2.ts
insert
- snippetSession.tsexecuteEdits
- codeEditorWidget.tsendEmitViewEvents
_refilterCompletionItems
- suggestModel.ts
showSuggestions
- suggestWidget.ts
_layout
- suggestWidget.tsgetScrolledVisiblePosition
codeEditorWidget.ts
_renderNow
- view.tssplice
- listWidget.ts(
anonymous) - viewModelImpl.ts
pushEditOperations
- TextModel.tsmemorize
- suggestMemory.tscancel
- suggestModel.ts
hideWidget
- suggestWidget.tssplice
- listWidget.tstype
- viewModelImpl.tstypeWithInterceptors
- cursorTypeOperations.ts
_runAutoIndentType
- cursorTypeOperations.tsexecuteEditOperation
The big revelation above is that there's a full render in both the keypress and the animation frame events, triggered by the suggest widget 🤯:
Here's a breakdown of the list splice
calls made when filtering suggest. This is a general area I've noticed is a little slow in other areas like search. This particular profile ("c", backspace, repeat to refilter on both) showed ListWidget.splice
taking up 11.28% of the total keypress task.
After digging through how List.splice and in particular HighlightedLabel.render
, nothing's sticking out in list's splice as an obvious way to improve performance unfortunately.
After pulling all the proposed changes into a single branch (tyriar/all_latency_changes) this is what pressing .
after a this
looks like. The red will mostly go away if we implemented the GPU-based renderer (https://github.com/microsoft/vscode/issues/162445), the blue part is minimap rendering which could be significantly sped up by moving to webgl.
Current main:
After the completion response comes back from the exthost this happens:
Current main:
I couldn't find many more wins here, most of it is bound by list splice and fetching the position of elements from the DOM (maybe something here could improve?). There are wins from moving to webgl as well but it would probably be more effort than it's worth to maintain.
Here is a look at re-filtering the already visible suggest widget:
Current main:
Just discovered this issue: https://github.com/Microsoft/vscode/issues/27378
Using typometer, here are the results for xterm.js, with a few changes to get it closer to what we could expect as the upper bound of adopting the webgl renderer:
Those are some pretty nice numbers 🙂, of course the model updates/events in bare xterm.js aren't nearly as costly as in vscode's editor though. The viewport is also quite small in xterm.js' demo which affects the time it takes to fill in the webgl buffers.
Interestingly, those 2 ones that are very low with 8.2-8.3 averages are when I was taking a devtools performance profile. My theory is it's related to performance vs efficiency cores in my i12 CPU. They are also far more consistent during the performance profile:
Compared to the other 3:
Another thing I noticed are the big spikes that sometimes occur. This is actually why I did the performance profile to have a look at them. Here's an example (edited the screenshot as the iterations part was giant and couldn't be collapsed):
I can't explain this as the work appears to be done right at the start, like all the other frames. Perhaps it's another application eating more CPU time? Or Chromium doing something and deciding to drop those frames? I tried to set process affinity/priority which didn't change the results. Also the processes are not marks as "efficiency mode" in task manager regardless of what task manager said, I think this doesn't map to efficiency cores though.
~Ok I'm pretty sure it is efficiency vs performance cores. I disabled all efficiency cores in BIOS and I get the good numbers of about 8ms average~ - I tried later and even with efficiency cores disabled I get the different set of numbers.
@Tyriar I don't want to add noise to this issue, but I'm excited to see you working on this. I've kept an eye on latency and though I could give some comparisons for context:
I found VSCode quite good — a lower min/max/mean that emacs & vim — at least on a high-spec Macbook Pro. Specifically, from a couple of years ago, typometer got 9.7/27.1/16.8 for min/max/mean; relative to 48/77/60 for vim and 29/48/38 for emacs.
I just reran it a couple of times now and got higher max & means in VSCode 1.72.2: 9.7/66.2/23.1 & 7.4/49.6/21.9. So maybe something has got slightly worse, or maybe the setups a different. Min is still impressively low!
To the extent we get a few ms saving, that's appreciated — I think I can notice a difference when there's more immediate feedback; even small amounts of latency or jitter feel worse (maybe needs a blind test though :) )
@max-sixty good to hear, I've been trying to squeeze out ms here and there for the past month or so, a lot of the changes are still in review though 🙂
For the unexplained xterm.js numbers, it appears to be related to extensions, incognito fixes the problem. Perhaps it waits on some extension APIs when not profiling?
Looking at the spikes in the profiles again I think I can explain it now:
There is a task that does Compute Intersections
where this happens:
These are for intersection observers which xterm.js does indeed use. https://developer.chrome.com/en/blog/new-in-devtools-92/#computed-intersections
There are other places where Compute Intersections
happens that don't have a spike, but all spikes contain a Compute Intersections
.
I disabled the textarea syncing to avoid intersection needing to be recomputed and the results seem to be better. There are still the occasional spikes:
Zooming into this one, it's happening because the requestAnimationFrame
doesn't fire for a long time, normally these 2 tasks are very close together:
It's not clear why, but if Chromium doesn't fire it there isn't much we can do about it. It does appear to be a longer interaction but I'm guessing that's because something's blocking so it delays processing the keyup event:
I also tried a JavaScript profile and it didn't reveal anything for the spikes.
The impact of disabling v-sync and unlocking the frame limit is that it's slightly slower by average but much more variable as you would expect. It doesn't lower the minimum though.
Here are some results of measuring latency in VS Code and other similar projects:
Some thoughts:
.
characters into a text file, so it ignores autocomplete for example, or just actual coding in general (varying glyphs, scrolling when typing, wrapping, files with a lot of content, etc.)Hardware used:
I'm going to call this done for now, here are the outcomes:
I recently did an exploration into input latency and found it can get pretty bad on slower machines. A lot of the problem is related to how we use synchronous event emitters/listeners work, performing their work after keypress but before the animation frame.
My proposal to improve is:
Tentatively assigning to October