Open eldipa opened 2 years ago
@superbobry , the PR is quite long but I have no problem to review it with you, piece by piece. Despite its length I think that the changes are an solid improvement (ok, I'm a little biased here).
A possible first round for the review process could be:
Screen.display
optimizationsScreen.draw
optimizationsThis walkthrough covers ~30% of the PR. From there we can think in the next rounds for review.
Thank for your time!
What is this PR about?
Optimization. My goal was to make
pyte
faster and lighter specially for large geometries (think in a screen of 240x800 or 2400x8000).Results (overview of the results)
For large geometries (240x800, 2400x8000),
Screen.display
runs several orders of magnitude faster and consumes between 1.10 and 50.0 times less memory.For smaller geometries the minimum improvement was of 2 times faster.
Stream.feed
is now between 1.10 and 7.30 times faster and ifScreen
is tuned, the speedup is between 1.14 and 12.0.However there is a regression for the
mc.input
test of up to 4 times slower.For memory usage,
Stream.feed
is between 1.10 and 17.0 times lighter and up to 44.0 times lighter ifScreen
is tuned.Screen.reset is between 1.10 and 1.50 slower but several cases improve if the
Screen
is tuned (but not all).Context (background)
byexample executes snippets of code using real interpreters (Python, Ruby, Java) and capturing the output so then
byexample
can check if it is the expected or not.While most of the interpreters are "terminal-naive", some are "terminal-aware" and they will output escape and control sequences which obviously are not of interest.
byexample
usespyte
for handling those cases (thanks for such great lib!!).Unfortunately using a screen introduces artifacts due the hard boundaries of the screen (24x80). Think in very long lines that are "unexpectedly" cut into two lines (cut that happen because the screen has a finite width).
A simple and elegant solution could create screen for larger geometries where these artifacts are much more rare.
Sadly, while
pyte
implements a sparse buffer, most of its algorithms are not aware and they don't take advantage of that making the terminal emulation really slow and consuming a lot of memory.This is the motivation for this PR: make it perform better!!
Note for the reviewers
This PR is not trivial. The commits are simple of understand (as much as I could) but still it is a quite large PR.
So I will be available for discussion and explanation of each commit so I can guide the review process.
Contributions
Screen.display
,Screen.resize
andScreen.reset
under different geometries (24x80, 240x800, 2400x8000, 24x8000, 2400x80). With these the benchmark takes much more time (sorry!) but it gives a deeper view of howpyte
works.Stream
instead ofByteStream
. The use of the former led to an incorrect interpretation of the new lines; the use ofByteStream
fixed that and it is aligned with thetest_input_output.py
tests.Screen.display
to work (approx) linearly with the input and not with the size of the screen (quadratic). Improved by a lot both for runtime and memory (specially for large geometries).Screen.compressed_display
that works similar toScreen.display
but it allows to "strip" empty space from the left or right and "filter" empty lines on top and bottom of the screen reducing time and memory.Screen.draw
with caching of attributes and methods (the same optimizations already present inStream._parser_fsm
).Char
's foreground, bold, blink (...) into a separatednamedtuple
CharStyle
. When possible, reuse the same style for multiple characters reducing the memory usage at the expense of an additional lookup (instead ofchar.fg
you havechar.style.fg
).Char
a mutable object allowing changes in thedata
andwidth
fields to be in-place instead of creating a newChar
object.Screen.index
andScreen.reverse_index
which improved indirectlyScreen.draw
andStream.feed
.Screen.resize
Screen.tabstop
ScreenHistory.prev_page
andScreenHistory.next_page
.Screen.insert_characters
,Screen.delete_characters
,Screen.insert_lines
andScreen.delete_characters
which improved the performance of "terminal aware" programs.Screen
's buffer and lines to have insight about the sparsity and usage of these elements. (The API is not not standard likeDebugScreen
).Screen.buffer
return aBufferView
. Retrieve of lines from it yieldLineView
instead ofLine
objects. This adds an overhead on user code but allows a separation between the public part and the internals. Iterate overLineView
still yieldsChar
objects as usual (to much high penalty otherwise).Screen.history
'stop
andbottom
queues returnLineView
and notLine
objectsScreen._buffer
adict
and not adefaultdict
. This prevent adding entries unintentionally which would make the buffer less sparse and therefore slow.Screen.erase_characters
,Screen.erase_in_line
andScreen.erase_in_display
)disable_display_graphic
isTrue
preventScreen.select_graphic_rendition
to change the cursor attributes (style). If the cursor attrs don't change, we can optimize the erase methods. The flag isFalse
by default. but just remove the chars from the buffer. This makes speedup other algorithms and maintain high the sparsity (and consume less memory).track_dirty_lines
isFalse
use aNullSet
forScreen.dirty
attribute to not consume any memory and discard any element, disabling effectively the dirty functionality. This saves time and memory for large geometries. The flag isTrue
by default.Screen.margin
always aMargin
object so we can avoid checking if it isNone
or not.Compatibility changes
The following are changes in the API that may break user code. A special care was taken to avoid this situation.
Char
is not longer anamedtuple
so things like_replace
are gone. If necessary we could reimplement the API ofnamedtuple
but I don't think users will use.Char
is mutable but the user must not relay on this: changes to character will have undefined behaviour. The user must use always the API provided byScreen
.Char
not longer has attributes forfg
,bg
,bold
. Instead, it has a single read-onlyCharStyle
. TheChar
class implementsfg
,bg
,bold
as properties to do the lookup to the style behind the scene. User code should not break then.Screen.buffer
now is a property that returns aBufferView
with a similar API to a dictionary. It yieldsLineView
objects instead ofLine
objects. These in turn yieldChar
objects (not views). User can still iterate over the lines and chars as if the buffer were a dense array and not a sparse array as it is really. Like any view, these are valid until the next modification of the screen. This change may break user code if it usesbuffer
in another way.top
andbottom
ofScreenHistory.history
containLineView
and notLine
objects. This may break user code.Screen.margin
is always aMargin
object: theNone
value is not longer supported`.TL;DR - Numbers overview
The following is a overview of the numbers got. To make this post as short as possible, the some results were omitted (rows omitted are marked with
:::
).Full benchmark results are left attached in this commit. People are encouraged to do their own benchmarks for cross validation.
Screen.display
Screen.display
was optimized to generate large chunks of spaces very quickly.For large geometries, this has an huge impact on the performance:
Screen.display
takes advantage of the sparsity of the screen and therefore it was indirectly beneficed by the optimizations done acrossScreen
to avoid filling it with false entries.Screen.display
it was also optimized on memory (tracemalloc
) avoiding then append of each space character separately when they could be appended in a single chunk.The only two regressions are:
Not sure why this happen.
Stream.feed
stream.feed
was not modified but its runtime depends onScreen
's performance.For terminal programs that just write into then terminal, like
cat-gpl3
andfind-etc
,stream.feed
merely sends then input toScreen.draw
for rendering.The method
Screen.draw
was optimized to avoid the modification of the cursor internally and update it only at the exit. This saved a few lookups.While not been frequently called,
Screen.index
was the next bottleneck forScreen.draw
: it moves all the lines of the screen which it means that all the entries of the buffer are rewritten.Screen.index
andScreen.reverse_index
were optimized to take advantage of the sparsity and to avoid adding false entries.This resulted on a speedup across the tests:
The
mc.input
however took much more time. Whentrack_dirty_lines
isFalse
anddisable_display_graphic
isTrue
, the overall performance increases even further.On memory there is an improvement too:
The following are the tests that show regression on memory usage.
When
track_dirty_lines
isFalse
anddisable_display_graphic
isTrue
, this is even better:However, we still have some regressions:
Screen.reset
For
Screen.reset
we have a regressions, some minor, some not-so-much minor:However when
track_dirty_lines
isFalse
anddisable_display_graphic
isTrue
, the things improves (but we still have regressions):