Closed jabdoa2 closed 5 years ago
Yeah - I can see that this will improve the rendering path, so is probably worth taking. Just want to check one thing first...
My experience of performance tuning the apps I've created with asciimatics is that the biggest gains are to be had by not re-drawing unless needed. I had a similar CPU usage issue on early versions and then spent a fair bit of time plumbing logic into Screen.draw_next_frame()
such that an idle UI generates almost zero CPU usage.
The way it works is that each Effect reports when the next update is required (and the Widgets all report this into the Frame for UIs). The Screen update logic then checks that counter as well as any overrides (if there has been any user input, or the app has forced an update), so an app can typically just present the UI and let asciimatics run. However, if there are important state changes that need to be reflected, just calling Screen.force_update()
can push them through even if the Frame thinks nothing has changed.
Sounds like your UI is hitting the constant refresh issues I hit in the past... Would this slective use of force_update be a better solution?
We already do that. We selectively update only when something changes (max 100Hz). The worst case for MPF is a bouncing switch which would change a label on every frame. Or an internal variable changing it's state very frequently. Unfortunately, those things happen in pinball machines.
Good. Just checking.
BTW, 100Hz is pretty fast for the human eye to register anything. You would probably notice no difference if you halved that refresh rate and could possibly even go to 25 Hz without any meaningful impact on what people can actually process.
The problem with that is that switch hits can be quite short. Like a ball hitting a target is usually 10-20ms only. We change the background color of the label during the time the switch is active. If we only update max 33Hz we might miss the hit. People seems to notice color changes faster. You will not really see it but notice it.
Yeah - people will be able to see that flicker (assuming your monitor is capable of displaying it). I'd still be tempted to drop to 50 Hz, but if you have the CPU to spare for 100Hz, there's no reason not to do that.
Not quite sure about this caching... I get that you want to minimize the use of split_text, but this seems to go a bit further than that and tries to optimize the colour decisions too. This cache is beginning to feel a liitle fragile to me and so I'd like to cut it down if possible. Does it make a meaningful difference to you if we just cache the result of the call to split_text?
That would also work. However, this still needs the integration with the widget because setting the layout needs to invalidate.
Yeah - I agree that it still needs to monitor layout changes, but I can live with that. Another option might be to encapsulate that logic into split_text with something like lru_cache
. How many Labels do you have (roughly speaking)?
I guess that would also work. I probably got like 100-1000 widgets. Only some are visible.
OK - I think I'd like to explore that a bit as a way of reducing the footprint for this caching.
Do you need this change for the v1.11 release?
I guess we would be fine without it. If we need it we can still overload the widget in our code.
Looks like pylru's lrudecorator
could work for all the environments we need. Does this patch work for you too?
diff --git a/asciimatics/widgets.py b/asciimatics/widgets.py
index 53405dd..ca94a32 100644
--- a/asciimatics/widgets.py
+++ b/asciimatics/widgets.py
@@ -29,6 +29,7 @@ from asciimatics.exceptions import Highlander, InvalidFields
from asciimatics.screen import Screen, Canvas
from asciimatics.utilities import readable_timestamp, readable_mem, _DotDict
from wcwidth import wcswidth, wcwidth
+from pylru import lrudecorator
# Logging
from logging import getLogger
@@ -211,6 +212,7 @@ def _get_offset(text, visible_width, unicode_aware=True):
return result
+@lrudecorator(100)
def _split_text(text, width, height, unicode_aware=True):
"""
Split text to required dimensions.
I will give it a try later
With @lrudecorator(100)
it actually gets worse in my benchmark (42% instead of 39%; 8-9% when idle). With @lrudecorator(1024)
or @lrudecorator(2048)
CPU load considerably drops (20% and 1% when idle).
I also tried functools.lru_cache
and it looks like the performance is even slightly better (18% vs 20%). It would also not require additional dependencies in Python 3 and there is a backport for the old versions (https://pypi.org/project/backports.functools_lru_cache/).
OK, so that says just caching the text calculation is a good gain. I'm a bit nervous about the occupancy of 1000 entries, though. Let me think some more about this.
Do you have a lot of widgets that aren't visible (e.g. because you need to scroll to see them)? If so, we could optimize the widget update logic instead...
Yes I got a lot of invisible widgets. I don't see where a cache with 1024 entries would hurt. It will be free when empty and reduce GC pressure in almost any other case.
OK - I've just merged a patch to stop drawing invisible widgets (https://github.com/peterbrittain/asciimatics/commit/bb8eeed71dae1259689de4e66deb70ee81adb4ad). What is the performance difference for you without the lru_cache?
With that fix the load drops to ~30% (instead of 39%). With lru_cache I get it down to 18% (instead of 20%). Seems like I need at least lru_cache(256). 128 is definitely not enough (30% load).
OK - I think we're about there, then. I'll happily add the LRU cache for 256 entries. Feel free to update the PR to use that instead. I might get a chance in the next week or so, otherwise....
I think that change does what we agreed. Please shout if there's anything else you needed...
Awesome. Thanks!
Issues fixed by this PR
We had ~40% worst case CPU load without the fix in MPF. With this fix worst case CPU reduces to ~30%.
What does this implement/fix?
Cache the result of split_text for labels. This hardly changes and we invalidate the cache if it would.
Any other comments?
I guess it would make sense in some other widgets too. Text for example.
PR is on top of #213 to prevent merge conflicts.