While the runtime of a general application using pyte is dominated by stream.feed for the standard geometry (24x80), the runtime of screen.display gets dominant for larger geometries (240x800, 2400x80, 24x8000).
This is because screen.display does not use the fact that screen.buffer is sparse and iterates over the whole range of possible coordinates (x,y) in the screen, wasting time accessing non-existing entries in screen.buffer.
Proposal
This PR does a series of changes to the screen.display method to make it faster with 4 changes:
make screen.display aware that screen.buffer is sparse and iterate over the real existing chars and not over the range of coordinates (bfeab39c2)
inline the generator into a for-loop: generators coded in Python (not in C) have a lower performance than traditional for-loop so a change is an easy win ( 5b32e257b)
remove an assert that was called for every single char: the corresponding check was moved to the tests so we don't loose coverage (13ee784ac)
cache wcwidth on each char: while wcwidth is already a function with a cache (thanks to functools), calling wcwidth still requires to do a call. We can avoid that storing the results of wcwidth on the char during the screen.draw and reuse it later in screen.display (c298bd358)
Results
For the standard geometry of 24x80 we got the following improvement on screen.display:
(*) I don't thing that the results of stream.feed are meaningful and the discrepancies look like more due the noise. In a separated analysis about pyperf (the tool that we use for the benchmark), it seems that it uses the average instead of the minimum of the samples so this will make the results slightly unstable)
Full results are in benchmark_results/: one file has the performance for 0.8.1 while the other includes the optimizations. These benchmark were executed with the auxiliary script fullbenchmark.
Context
While the runtime of a general application using
pyte
is dominated bystream.feed
for the standard geometry (24x80), the runtime ofscreen.display
gets dominant for larger geometries (240x800, 2400x80, 24x8000).This is because
screen.display
does not use the fact thatscreen.buffer
is sparse and iterates over the whole range of possible coordinates(x,y)
in the screen, wasting time accessing non-existing entries inscreen.buffer
.Proposal
This PR does a series of changes to the
screen.display
method to make it faster with 4 changes:screen.display
aware thatscreen.buffer
is sparse and iterate over the real existing chars and not over the range of coordinates (bfeab39c2)for
-loop: generators coded in Python (not in C) have a lower performance than traditionalfor
-loop so a change is an easy win ( 5b32e257b)assert
that was called for every single char: the corresponding check was moved to the tests so we don't loose coverage (13ee784ac)wcwidth
on each char: whilewcwidth
is already a function with a cache (thanks tofunctools
), callingwcwidth
still requires to do a call. We can avoid that storing the results ofwcwidth
on the char during thescreen.draw
and reuse it later inscreen.display
(c298bd358)Results
For the standard geometry of 24x80 we got the following improvement on
screen.display
:For larger geometries we made
screen.display
x10, x100 and almost x1000 faster.For
stream.feed
we got a minimal improvement and a minimal regression (*)(*) I don't thing that the results of
stream.feed
are meaningful and the discrepancies look like more due the noise. In a separated analysis aboutpyperf
(the tool that we use for the benchmark), it seems that it uses the average instead of the minimum of the samples so this will make the results slightly unstable)Full results are in
benchmark_results/
: one file has the performance for0.8.1
while the other includes the optimizations. These benchmark were executed with the auxiliary scriptfullbenchmark
.