Large output makes terminal unusable

dolik-rce commented 8 months ago

When application produces large output quickly, Bobcat uses 100% CPU and practically freezes. The text is rendered later, but veeeeery slowly.

To reproduce, just run curl -L https://github.com/json-iterator/test-data/raw/master/large-file.json. This produces ~30 Mb of JSON.

Sakura terminal renders this in about 4s. Bobcat takes minutes and the whole time the CPU goes full throttle :-(

ismail-yilmaz commented 8 months ago

curl -L https://github.com/json-iterator/test-data/raw/master/large-file.json

Is linkifier enabled (hyperlinks toggle is checked)? I suspect that is the culprit here because the file contains huge amount of URL. Although linkifier only scans the visible portion of the display, it will likely be expensive. My results on a very poor (temporarily) network connection is as follows (linkifier disabled):

Gnome terminal:

real    0m33,368s
user 0m0,165s
sys 0m0,480s

Bobcat:

real  0m31,079s
user 0m0,190s
sys   0m0,398s

Alacritty:

real  0m31,165s
user 0m0,125s
sys   0m0,500s

Sakura

real  0m31,491s
user 0m0,176s
sys   0m0,471s

All terminals were up-to-date.

ismail-yilmaz commented 8 months ago

Also, could you check if delayed (buffered) display refresh is enabled. Because if it isn't, the display will be redrawn immediately, and it will be very slow.

ismail-yilmaz commented 8 months ago

I downloaded the file and then catted in Bobcat (page sizes are same across the terminals (274x56).

Bobcat:

real    0m0,448s
user    0m0,000s
sys     0m0,056s

Sakura:

real    0m0,940s
user    0m0,000s
sys 0m0,053s

Alacritty:

real    0m0,382s
user    0m0,000s
sys 0m0,072s

Gnome termina:

real    0m1,050s
user    0m0,004s
sys 0m0,049s

dolik-rce commented 8 months ago

Ok, that was not very good example I guess. So I have downloaded the file and tried to measure just printing it with time cat large-file.json. And yes, you're right again :-) It is definitely caused by linkifier.

Sakura:

real    0m1,998s
user    0m0,001s
sys 0m0,152s

Bobcat, brand new profile without linkifier:

real    0m1,998s
user    0m0,001s
sys 0m0,152s

Bobcat, with linkifier: I killed it after 3 minutes... The regexp was rather complex: https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*).

Then I tried to remove the urls from the file (simply by sed 's/http/____/g' -i large-file.json). This runs reasonably fast:

real    0m7,452s
user    0m0,002s
sys     0m0,196s

I think it might make sense to process the links "asynchronously". In this extreme case, there is no reason to process most of the links anyway, because they scroll out of the buffer very fast. The linkification might be a background job, running only when the terminal is not busy with something more important (like printing output).

Or, there is the trick that Sakura uses. I just noticed that it ignores the links completely when printing output to the screen. It only highlights them on mouse move, which saves it a ton of useless text processing. It is not so nice to the user, because the links are not visually marked, but the pay off is huge :-)

ismail-yilmaz commented 8 months ago

Or, there is the trick that Sakura uses. I just noticed that it ignores the links completely when printing output to the screen. It only highlights them on mouse move, which saves it a ton of useless text processing. It is not so nice to the user, because the links are not visually marked, but the pay off is huge :-)

Yes, I am think along that line (Sakura's). But I believe we don't have to sacrifice the visual cues:

A configurable hot key, while pressed, can run the linkifier and highlight (draw dots) the inactive links.
Mouse and/or a cursor (as we did in the finder) can be used to select and highlight the active link.

Basically, it will run on-demand.

dolik-rce commented 8 months ago

Also, could you check if delayed (buffered) display refresh is enabled.

Oh, I overlooked this post...

Both "Buffered refresh" option in emulation menu and "Delayed display refresh" in profile settings were on when I was testing.

ismail-yilmaz commented 7 months ago

Ok, I have refactored and tried to optimize the linkifier.

It should be pretty fast now. And -good news- with no change to its behavior. I have also added a safeguard against possibly spamming input data, such as the above test file. Same test configuration, page size, cat result is now (linkifier enabled):

real    0m0,435s
user    0m0,000s
sys     0m0,057s

There might be some rough edges, but I'll smooth them out in in the following days.

Please check.

dolik-rce commented 7 months ago

I've tested the latest version (2c1f862295c5e38887e1f56be31578c37c6bb392) and it works great.

For the ugly big json with lots of links, I get ~18s in profile with two ugly regexes in Linkifier, ~11s in profile with only one regex and ~8s for profile without linkification. So it scales quite linearly.

Interestingly, sakura is still much faster, about 2s. I have also tried with and without "Delayed display refresh", but there is no noticeable difference.

Anyway, I'm really happy with the current state, so feel free to close this issue. The terminal is now definitely usable for me.

ismail-yilmaz commented 7 months ago

I've tested the latest version (2c1f862) and it works great.

For the ugly big json with lots of links, I get ~18s in profile with two ugly regexes in Linkifier, ~11s in profile with only one regex and ~8s for profile without linkification. So it scales quite linearly.

Interestingly, sakura is still much faster, about 2s. I have also tried with and without "Delayed display refresh", but there is no noticeable difference.

Anyway, I'm really happy with the current state, so feel free to close this issue. The terminal is now definitely usable for me.

Interesting...

While there is definitely a room for performance improvement in linkifier, on all of my machines (all AMD, 1 ryzen 5, 2 ryzen 7, one threadripper 32 core, min 16 GB & max128 GB ram, all up-to-date arch linux) the resuls appear to be consistent. In fact, on my modest test-bed machine (ryzen 5600g, 16 GB), the final results are as follows (cat'ed the json file locally, page size 274x56 - window maximized on a 1080p screen, and the regex pattern you provided):

Bobcat

real    0m0,460s
user    0m0,000s
sys     0m0,057s

Sakura (VTE-based)

real    0m1,040s
user    0m0,000s
sys 0m0,060s

Gnome terminal (VTE-based)

real    0m1,025s
user    0m0,000s
sys 0m0,061s

Kitty

real    0m0,484s
user    0m0,000s
sys 0m0,083s

Alacritty

real    0m0,367s
user    0m0,000s
sys 0m0,095s

Wezterm

real    0m4,989s
user    0m0,004s
sys     0m0,112s

Sakura and Gnome Terminal are quite close to each other and this is expected, as they both use VTE. However, sakura (up-to-date) never managed to surpass Bobcat with the new linkifier code on my machines (neither did Gnome Terminal). Possibly a HW/SW configuration is responsible for the difference in your and my results. Are you using Bobcat witk GTK or NOGTK?

dolik-rce commented 7 months ago

I'm compiling Bobcat with umk, via the provided makefile. So the flags are GUI, SHARED.

My laptop is bit older, than your machines, Intel i5, 8GB ram. I'm also running latest arch linux. Do you use X by any chance? Because I'm on wayland (with sway compositor). That might explain the difference since Bobcat runs through XWayland, which might not be as fast as Sakura, which uses native wayland.

ismail-yilmaz commented 7 months ago

I'm compiling Bobcat with umk, via the provided makefile. So the flags are GUI, SHARED.

My laptop is bit older, than your machines, Intel i5, 8GB ram. I'm also running latest arch linux. Do you use X by any chance? Because I'm on wayland (with sway compositor). That might explain the difference since Bobcat runs through XWayland, which might not be as fast as Sakura, which uses native wayland.

And that suprises me all the more. I am using wayland with gnome, so xwayland is present. Nevertheless, thanks for the info. I'll have the chance to check the performance against a i5 8GB Dell inspiron laptop (with arch linux) tomorrow. I'll share my findings then.

dolik-rce commented 7 months ago

If you can't reproduce it, I can also send you my config files, and perhaps more info about my system (installed package versions etc).

ismail-yilmaz commented 7 months ago

If you can't reproduce it, I can also send you my config files, and perhaps more info about my system (installed package versions etc).

I have asked a friend of mine to test Bobcat and Sakura on his Dell Inspiron 13 7359 (6th gen intel i5-6200u CPU, 8GB 1600 Mhz DDR3 RAM, Intel HD graphics, 720p screen, 500 Gib SSD). A fairly old, but very nice device. Test setup was, again, an up-to-date arch linux with GNOME 45, wayland. The screen size of all terminals were 170x38, same data file and regexp pattern was used. Bobcat is compiled with the provided makefile, sakura is compiled via AUR (using yay). They all have vanilla config. Results are as follows:

Bobcat:

real    0m1,497s
user    0m0,004s
sys 0m0,200s

Sakura:

real    0m2,355s
user    0m0,001s
sys 0m0,158s

Gnome Terminal:

real    0m2,272s
user    0m0,000s
sys 0m0,180s

The above are the average results of 10 rounds. This was consistent throughout the rounds. In none of them Sakura surpassed Bobcat.

Now, I can only speculate about the difference, but on all test machines I've gnome-shell and mutter as compositor, could you be using something different?

And yes, I would be grateful if you could provide basic and non-sensitive env, bash, packag configuration, so that I can try to replicate it this weekend.

dolik-rce commented 7 months ago

Ok, my testing profile is here: https://pastebin.com/nPHw5LS0

I have freshly compiled a5204ba722d47c24f1f956259b8db4f2c817714b, using 3p/umk/umk ./,3p/uppsrc Bobcat 3p/umk/CLANG_CPP17.bm -bv +GUI,SHARED build/Bobcat.

Executing time cat large-file.json in Bobcat consistently takes 8-9s. Executing the same command in sakura takes 2-2.3s. I have also tried with gnome-terminal, which runs just as fast as sakura. Not surprising, since it uses the same library.

I'm not using gnome-shell, that might be a good clue. I use sway (a tiling window manager, inspired by i3). Most relevanta package versions:

bash 5.2.026-2
sakura 3.8.7-1
gnome-terminal 3.50.1-1
wayland 1.22.0-1
sway 1:1.8.1-5
fontconfig 2:2.15.0-2

Full list can be found here: https://pastebin.com/BdvLdhcB

If I'm not mistaken, VTE doesn't use HW acceleration, so there should not be any advantage coming from hardware. I don't have dedicated GPU anyway, only the integrated Intel chipset.

dolik-rce commented 7 months ago

Oh and here is a flamegraph (created using this wonderful tool) of bobcat running cat several times: bobcat

ismail-yilmaz commented 7 months ago

I have installed & run sway and checked the apps there, but nothing's changed :/ (HW accelaration is not really a problem here, as you can see in my previous posts Alacritty, Kitty and WezTerm are HW accelarated and there isn't much difference in this regard. [of course, on some other situations they are way faster.] )

Apparently this needs further investigation, I'll have time this weekend. Thank you very much for the info and the tool, it really looks handy. I'll try it asap.

dolik-rce commented 7 months ago

Actually the graph is even better when displayed correctly. Clicking a function should zoom it to show more details, and you can even search by function name to get an idea how it is called and how much time it takes. Unfortunately, github probably strips the javascript, for security reasons. You can view it in all its glory here: https://gistcdn.githack.com/dolik-rce/e351c087bb2e28f5e54110c65c98d8d9/raw/b5f984ff8ae30b84c55a4c7f3efdfaa67656f011/bobcat.svg

dolik-rce commented 7 months ago

Oooooh man, I am stupid... Finally found the problem. And of course it is between the chair and keyboard :man_facepalming:

I've been compiling it in debug mode the whole time. Once I added to r to umk command it easily beats sakura: real 0m1,334s user 0m0,002s sys 0m0,174s

I'm truly sorry to waste your and your friends time like this...

BTW: It was actually the flamegraph that made me realize it :-) It clicked in my head once I zoomed in and noticed Upp::BlkHeap::DbgFreeFill and similar function which should not be present in release build.

ismail-yilmaz commented 7 months ago

Oooooh man, I am stupid... Finally found the problem. And of course it is between the chair and keyboard 🤦‍♂️

I've been compiling it in debug mode the whole time. Once I added to r to umk command it easily beats sakura: real 0m1,334s user 0m0,002s sys 0m0,174s

I'm truly sorry to waste your and your friends time like this...

BTW: It was actually the flamegraph that made me realize it :-) It clicked in my head once I zoomed in and noticed Upp::BlkHeap::DbgFreeFill and similar function which should not be present in release build.

Lol. No worries, I am relieved (didn't want this to be a bottleneck deep in TerminalCtrl, which was thorougly battle- and stress-tested in the last five years). This mistake actually proved fruitful. I have implemented the CoFind method in TerminalCtrl, which makes the search operations in large buffer ranges or non-trivial buffer scans (e.g. regex) up to 4-5 times faster. I'll add that as an option to finder. Thanks for the feedback, and the flamegraph, it indeed helped us tonight. :)

ismail-yilmaz / Bobcat

Large output makes terminal unusable #19