peterbrittain / asciimatics

A cross platform package to do curses-like operations, plus higher level APIs and widgets to create text UIs and ASCII art animations
Apache License 2.0
3.64k stars 238 forks source link

Add caching to labels #214

Closed jabdoa2 closed 5 years ago

jabdoa2 commented 5 years ago

Issues fixed by this PR

We had ~40% worst case CPU load without the fix in MPF. With this fix worst case CPU reduces to ~30%.

What does this implement/fix?

Cache the result of split_text for labels. This hardly changes and we invalidate the cache if it would.

Any other comments?

I guess it would make sense in some other widgets too. Text for example.

PR is on top of #213 to prevent merge conflicts.

coveralls commented 5 years ago

Coverage Status

Coverage increased (+0.03%) to 96.316% when pulling fd1795c714cbeba06d17f920bd31be9010338f40 on jabdoa2:add_caching_to_labels into 3c3267a76cd57bbe8df711c86743c96f819ac47a on peterbrittain:master.

peterbrittain commented 5 years ago

Yeah - I can see that this will improve the rendering path, so is probably worth taking. Just want to check one thing first...

My experience of performance tuning the apps I've created with asciimatics is that the biggest gains are to be had by not re-drawing unless needed. I had a similar CPU usage issue on early versions and then spent a fair bit of time plumbing logic into Screen.draw_next_frame() such that an idle UI generates almost zero CPU usage.

The way it works is that each Effect reports when the next update is required (and the Widgets all report this into the Frame for UIs). The Screen update logic then checks that counter as well as any overrides (if there has been any user input, or the app has forced an update), so an app can typically just present the UI and let asciimatics run. However, if there are important state changes that need to be reflected, just calling Screen.force_update() can push them through even if the Frame thinks nothing has changed.

Sounds like your UI is hitting the constant refresh issues I hit in the past... Would this slective use of force_update be a better solution?

jabdoa2 commented 5 years ago

We already do that. We selectively update only when something changes (max 100Hz). The worst case for MPF is a bouncing switch which would change a label on every frame. Or an internal variable changing it's state very frequently. Unfortunately, those things happen in pinball machines.

peterbrittain commented 5 years ago

Good. Just checking.

BTW, 100Hz is pretty fast for the human eye to register anything. You would probably notice no difference if you halved that refresh rate and could possibly even go to 25 Hz without any meaningful impact on what people can actually process.

jabdoa2 commented 5 years ago

The problem with that is that switch hits can be quite short. Like a ball hitting a target is usually 10-20ms only. We change the background color of the label during the time the switch is active. If we only update max 33Hz we might miss the hit. People seems to notice color changes faster. You will not really see it but notice it.

peterbrittain commented 5 years ago

Yeah - people will be able to see that flicker (assuming your monitor is capable of displaying it). I'd still be tempted to drop to 50 Hz, but if you have the CPU to spare for 100Hz, there's no reason not to do that.

peterbrittain commented 5 years ago

Not quite sure about this caching... I get that you want to minimize the use of split_text, but this seems to go a bit further than that and tries to optimize the colour decisions too. This cache is beginning to feel a liitle fragile to me and so I'd like to cut it down if possible. Does it make a meaningful difference to you if we just cache the result of the call to split_text?

jabdoa2 commented 5 years ago

That would also work. However, this still needs the integration with the widget because setting the layout needs to invalidate.

peterbrittain commented 5 years ago

Yeah - I agree that it still needs to monitor layout changes, but I can live with that. Another option might be to encapsulate that logic into split_text with something like lru_cache. How many Labels do you have (roughly speaking)?

jabdoa2 commented 5 years ago

I guess that would also work. I probably got like 100-1000 widgets. Only some are visible.

peterbrittain commented 5 years ago

OK - I think I'd like to explore that a bit as a way of reducing the footprint for this caching.

peterbrittain commented 5 years ago

Do you need this change for the v1.11 release?

jabdoa2 commented 5 years ago

I guess we would be fine without it. If we need it we can still overload the widget in our code.

peterbrittain commented 5 years ago

Looks like pylru's lrudecorator could work for all the environments we need. Does this patch work for you too?

diff --git a/asciimatics/widgets.py b/asciimatics/widgets.py
index 53405dd..ca94a32 100644
--- a/asciimatics/widgets.py
+++ b/asciimatics/widgets.py
@@ -29,6 +29,7 @@ from asciimatics.exceptions import Highlander, InvalidFields
 from asciimatics.screen import Screen, Canvas
 from asciimatics.utilities import readable_timestamp, readable_mem, _DotDict
 from wcwidth import wcswidth, wcwidth
+from pylru import lrudecorator

 # Logging
 from logging import getLogger
@@ -211,6 +212,7 @@ def _get_offset(text, visible_width, unicode_aware=True):
     return result

+@lrudecorator(100)
 def _split_text(text, width, height, unicode_aware=True):
     """
     Split text to required dimensions.
jabdoa2 commented 5 years ago

I will give it a try later

jabdoa2 commented 5 years ago

With @lrudecorator(100) it actually gets worse in my benchmark (42% instead of 39%; 8-9% when idle). With @lrudecorator(1024) or @lrudecorator(2048) CPU load considerably drops (20% and 1% when idle).

jabdoa2 commented 5 years ago

I also tried functools.lru_cache and it looks like the performance is even slightly better (18% vs 20%). It would also not require additional dependencies in Python 3 and there is a backport for the old versions (https://pypi.org/project/backports.functools_lru_cache/).

peterbrittain commented 5 years ago

OK, so that says just caching the text calculation is a good gain. I'm a bit nervous about the occupancy of 1000 entries, though. Let me think some more about this.

peterbrittain commented 5 years ago

Do you have a lot of widgets that aren't visible (e.g. because you need to scroll to see them)? If so, we could optimize the widget update logic instead...

jabdoa2 commented 5 years ago

Yes I got a lot of invisible widgets. I don't see where a cache with 1024 entries would hurt. It will be free when empty and reduce GC pressure in almost any other case.

peterbrittain commented 5 years ago

OK - I've just merged a patch to stop drawing invisible widgets (https://github.com/peterbrittain/asciimatics/commit/bb8eeed71dae1259689de4e66deb70ee81adb4ad). What is the performance difference for you without the lru_cache?

jabdoa2 commented 5 years ago

With that fix the load drops to ~30% (instead of 39%). With lru_cache I get it down to 18% (instead of 20%). Seems like I need at least lru_cache(256). 128 is definitely not enough (30% load).

peterbrittain commented 5 years ago

OK - I think we're about there, then. I'll happily add the LRU cache for 256 entries. Feel free to update the PR to use that instead. I might get a chance in the next week or so, otherwise....

peterbrittain commented 5 years ago

I think that change does what we agreed. Please shout if there's anything else you needed...

jabdoa2 commented 5 years ago

Awesome. Thanks!