microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
94.24k stars 8.16k forks source link

An attempt to improve performance of terminal/parser #17336

Open flyingcat opened 1 month ago

flyingcat commented 1 month ago

Description of the new feature/enhancement

Currently StateMachine use many if-else branches which may not an efficient way to implement a state machine. It could be improved by using a table approach.

Proposed technical implementation details (optional)

A quick and dirty implement lays on https://github.com/flyingcat/terminal/tree/vtparser

Some surprise:

Benchmarks (_P for plain text, _V for nvim vt output):

x64                                     ARM64                                   ARM64 no heavy tracing
-------------------------------         -------------------------------         -------------------------------
> VT_EN_P : 2.56 MB     Passed.         > VT_EN_P : 2.56 MB     Passed.         > VT_EN_P : 2.56 MB     Passed.
parser 0: 4907.81 us                    parser 0: 5061.98 us                    parser 0: 5052.18 us
parser 1: 4869.68 us, +0.8%             parser 1: 5110.27 us, -0.9%             parser 1: 4376.73 us, +15.4%
parser 2: 4450.93 us, +10.3%            parser 2: 4330.88 us, +16.9%            parser 2: 3803.58 us, +32.8%
> VT_EN_V : 11.36 MB    Passed.         > VT_EN_V : 11.36 MB    Passed.         > VT_EN_V : 11.36 MB    Passed.
parser 0: 150354.27 us                  parser 0: 167648.47 us                  parser 0: 165199.61 us
parser 1: 104640.22 us, +43.7%          parser 1: 122205.31 us, +37.2%          parser 1: 88889.78 us, +85.8%
parser 2: 98848.33 us, +52.1%           parser 2: 106223.27 us, +57.8%          parser 2: 83710.12 us, +97.3%
> VT_CN_P : 2.12 MB     Passed.         > VT_CN_P : 2.12 MB     Passed.         > VT_CN_P : 2.12 MB     Passed.
parser 0: 2746.15 us                    parser 0: 3009.11 us                    parser 0: 2982.51 us
parser 1: 2735.91 us, +0.4%             parser 1: 3085.54 us, -2.5%             parser 1: 2829.58 us, +5.4%
parser 2: 2613.93 us, +5.1%             parser 2: 2759.70 us, +9.0%             parser 2: 2793.15 us, +6.8%
> VT_CN_V : 11.29 MB    Passed.         > VT_CN_V : 11.29 MB    Passed.         > VT_CN_V : 11.29 MB    Passed.
parser 0: 214276.04 us                  parser 0: 238064.93 us                  parser 0: 236772.92 us
parser 1: 146150.74 us, +46.6%          parser 1: 175675.84 us, +35.5%          parser 1: 127224.79 us, +86.1%
parser 2: 140112.63 us, +52.9%          parser 2: 154611.41 us, +54.0%          parser 2: 121005.83 us, +95.7%
Execute .\bc.exe -v -vc .\bc_data.txt

x64                                     ARM64
-------------------------------         -------------------------------
before                                  before
129.715MB, 9.118s, 14.227MB/s           129.713MB, 9.246s, 14.029MB/s
after                                   after
129.713MB, 8.384s, 15.471MB/s           129.711MB, 8.327s, 15.578MB/s

Sorry for the bad English and messy code. Hope the idea is clear enough.

lhecker commented 1 month ago

This is some of the most impressive stuff I've seen in a while lol. You're using a vim script to turn the typescript compiler into test VT output. The entire branch has so cool ideas and helpful bits! It'll take me a while to go through it and test it out. 🙂

Hope the idea is clear enough.

Very clear actually! Using lookup tables for the parser is something I wanted to do for a very long time. I'm extremely happy that you did it.

I feel like that tables are still superior even if they aren't faster, because it's easier to verify their correctness and debug them. It may take a little while for me to get a chance to look at your code in more detail, as I'm currently working on some difficult changes to ConPTY which will also improve performance a lot (it'll make Windows Terminal >10x faster!).

That aside, these performance numbers look wrong:

129.715MB, 9.118s, 14.227MB/s

You should get at least 100MB/s with our current parser (without lookup tables). If those numbers are from OpenConsole, please ensure you have built it in Release mode and that you don't have node.js running in the background. node.js subscribes to console events which slows down all console applications by 10x. (Even those that don't have anything to do with node.js!)