openstenoproject / plover

Open source stenotype engine
http://opensteno.org/plover
GNU General Public License v2.0
2.35k stars 279 forks source link

Speed up Plover start responsiveness by delaying auxiliary data structure construction #1232

Open user202729 opened 3 years ago

user202729 commented 3 years ago

Currently, the dictionaries take about 1-2 seconds to load.

Describe the solution you'd like

Most of the time is spent in the construction of the lookup/case reverse lookup mappings. That can be done lazily (on request), or in another thread.

It's already possible to lookup in the dictionary without these structures.

normalize_steno is harder to eliminate... I have some possible ideas.

Describe alternatives you've considered


(FWIW I make a local change to remove the normalize_steno on load. It is somewhat faster.)

benoit-pierre commented 3 years ago

I'd like for normalize_steno to be optional, maybe only if the dictionary changed.

Other possible way to improve performance I've experimented with: cythonize the Plover package.

user202729 commented 3 years ago

Prototype (actually works):

https://github.com/user202729/plover-json-lazy

Overrides the built-in JSON dictionary.

There are quite a lot of copy and paste from Plover's code.


Currently, I think an arbitrary one is chosen by Plover (the RTF dictionary plugin can't override Plover's built-in RTF plugin: https://github.com/sammdot/plover-better-rtf )

If different distributions provide the same name, the consumer decides how to handle such conflicts. [source: https://packaging.python.org/specifications/entry-points/]

At the moment, it happens to work on my machine.

Perhaps prioritizing user's dictionary in case of conflict is better.


Performance: 1.4 -> 0.6s, saves 0.8s, although I also include a hack to remove normalize_steno; otherwise it would have taken 2.2s.

There are some more performance discussions in issue 1243. (in my case, I'm quite annoyed that I have to restart Plover frequently during development -- by the way my Plover restart stroke is defined as {PLOVER:SHELL:xterm -e bash -c "sleep 0.1s; plover; bash" &}{PLOVER:QUIT})


Is this a desirable feature?

fourshade commented 3 years ago

One way to speed up normalization is caching. Dropping @functools.lru_cache(maxsize=None) in front of normalize_stroke will speed things up a bit since some strokes are much more common than others. The stats I got for my dictionary loadout were:

CacheInfo(hits=310276, misses=35377, maxsize=None, currsize=35377)

Somehow, the Plover process on my machine ends up using less total memory after loading completes with the cache than it does without it, even with an explicit garbage collection pass at the end. I have no idea what Python is doing with its heap to cause this, but I don't expect the actual memory cost of the cache to be very much anyway since most of the string objects in it are shared with the dictionaries.

user202729 commented 3 years ago

About the memory usage part: try sys.intern?

Besides Python doesn't always give back free memory to the operating system, it uses its own allocator.

fourshade commented 3 years ago

I remember trying sys.intern in a few places long ago and had no luck getting improvements of either type (performance or memory). If you can find a clever spot to put it that helps, please share!

No time tonight to dig into heap profiling tools, but it does appear that clearing the cache manually at the end releases 1.3 MB back to the operating system. That seems reasonable enough.

fourshade commented 3 years ago

I have an idea of what could be happening; there could be a net savings of memory after garbage collection if the cache is reusing string objects. In a dictionary created using the cache, the keys will only contain references to one string object for each unique stroke read from the file. Without it, str.split could fill the dictionary with distinct copies of identical strings. Observe:

Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)] on win32
>>> strings = ["this is a test", "this is a test", "a test this is"]
>>> words = [word for s in strings for word in s.split()]
>>> print({id(word): word for word in words})
{54240432: 'this', 54241072: 'is', 31515440: 'a', 54240560: 'test', 54241136: 'this', 54240496: 'is', 54241008: 'test', 54241264: 'test', 54241328: 'this', 54241392: 'is'}

CPython does have an internal string cache, but it only extends to single characters by default (note the string 'a' has only one instance). My OS reported a total savings of around 18 MB; I'll see if I can get more detail on the memory structure of the completed steno dictionaries.

UPDATE: Memory savings is confirmed. Calling sys.getsizeof on the recursive/memoized contents of the raw dictionaries is giving me a total of 41,098,908 bytes before caching and 24,773,536 bytes after. This wasn't really the purpose of the cache, but it's a nice side effect.