[performance] A lot of CPU time consumed in tcell's `CanDisplay()`

dmaluka commented 3 months ago

CPU profiling via micro -profile shows that among the CPU time consumed by micro, a large fraction of time is usually spent inside tcell's CanDisplay() function (which is called from screen.SetContent() which is called from displayBuffer()).

In particular, most of the time inside CanDisplay() is spent in runtime.makeslice() -> runtime.mallocgc(), i.e. when allocating memory when creating slices. Most probably it is this code in CanDisplay():

    if enc := t.encoder; enc != nil {
        nb := make([]byte, 6)
        ob := make([]byte, 6)

Micro calls screen.SetContent() for virtually every character on the screen (including blank space), every time when updating the screen, so as a result, it spends time to repeatedly allocate memory for these 2 small temporary slices thousands of times every time.

Commit hash: 828871ac OS: Unix systems Terminal: any

JoeKar commented 3 months ago

Most probably it is this code in CanDisplay():
  if enc := t.encoder; enc != nil {
      nb := make([]byte, 6)
      ob := make([]byte, 6)

The best would be to optimize tcell in that case, since the API...

    // CanDisplay returns true if the given rune can be displayed on
    // this screen.  Note that this is a best guess effort -- whether
    // your fonts support the character or not may be questionable.
    // Mostly this is for folks who work outside of Unicode.
    //
    // If checkFallbacks is true, then if any (possibly imperfect)
    // fallbacks are registered, this will return true.  This will
    // also return true if the terminal can replace the glyph with
    // one that is visually indistinguishable from the one requested.
    CanDisplay(r rune, checkFallbacks bool) bool

...doesn't deny or suggest to don't use it as it's done. Maybe there is some room in micro too, but currently tcell seems to be the low hanging fruit: Reserving these 12..16 byte within the tScreen (+ simscreen, etc.), structure, initialize it at Init() and clear it at the usage with CanDisplay(). No further allocation...

dmaluka commented 3 months ago

Actually we can try just disabling its usage in micro (at least just as a test hack to see its actual impact on performance):

diff --git a/internal/screen/screen.go b/internal/screen/screen.go
index 16c011e6..ad369320 100644
--- a/internal/screen/screen.go
+++ b/internal/screen/screen.go
@@ -109,10 +109,6 @@ func ShowCursor(x, y int) {
 // SetContent sets a cell at a point on the screen and makes sure that it is
 // synced with the last cursor location
 func SetContent(x, y int, mainc rune, combc []rune, style tcell.Style) {
-   if !Screen.CanDisplay(mainc, true) {
-       mainc = '�'
-   }
-
    Screen.SetContent(x, y, mainc, combc, style)
    if UseFake() && lastCursor.x == x && lastCursor.y == y {
        lastCursor.r = mainc

I've tried, and haven't really noticed any impressive improvement so far. The CPU usage when continuously scrolling the buffer (e.g. by pressing and holding Down key) is maybe 22% instead of 25%, and that's it (that is still a magnitude higher than in vim). Which doesn't mean that there is no significant improvement in some cases (profiling suggests that there should be), I just haven't found any yet.

If we manage to reduce/eliminate map accesses (per #3228), it would be really interesting to see how much improvement it gives us, compared to removal of CanDisplay(), as well as together with removal of CanDisplay().

Also it would be interesting to see micro's performance on a really low-end CPU (like on the Raspberry in #2970) and how does it improve after removing CanDisplay()... Do you happen to have a low-end ARM board that can boot Linux (to try to run micro on it)?

JoeKar commented 3 months ago

In fact I've a old Raspberry Pi 1B in productive use as PiHole. :wink: I will do some tests when there is some time for that.

JoeKar commented 3 months ago

micro-master_rpi1b.prof.log: actual master - horrible slow
micro-master-CanDisplay_rpi1b.prof.log: master with CanDisplay() disabled - still very slow, but already slightly better

zyedidia / micro

[performance] A lot of CPU time consumed in tcell's `CanDisplay()` #3227