algebra caching idea - Githubissues

arsenovic commented 7 years ago

For performance reasons is might be nice to have a set of cached algebras available to the user. These could be generated at install or manually after install, depending on the users needs.

https://gist.github.com/arsenovic/4639ab4144d91376edf07a4842fa910c

@moble @hugohadfield thoughts?

moble commented 7 years ago

Seems like a good idea, as long as the code still uses these big tables. A few questions / comments:

I can't quite follow the logic. Are the big tables stored, or just the sparse ones if you pull in Hugo's modifications? Is there anything else that's really slow about constructing the layout? If not, would it be better to just store the sparse objects and hook them in to the layout initializer?
I'm a little fuzzy on pickle, but I think one of its advantages/problems is that it reconstitutes the object as it was at the time it was stored. So if you change the Layout class between the time the pickle is created and the time it is read, there will be an inconsistency — it might be missing some feature, or have incompatible data format, etc. This might not be a problem when people are just installing once and using the code as is, but I imagine it would be a huge pain when you're trying to develop the package.
I do something similar in one of my projects. I store and retrieve the data using numpy's own save and load functions because I found them to be the fastest option. As I recall, the compression could have been a bit better, but it wasn't really worth fussing about. In particular, the loading part is very fast, and versioning isn't a problem. The data are then distributed and installed with my code. Especially if you're using the sparse matrices, this shouldn't be a problem. My steps were these:

a. I generated the data once on my own machine b. I then copy them over during installation. They wind up in the installation directory with the rest of the __init__.py files and such. c. They then get loaded during import. As you mentioned, you'll want to do this dynamically only when a particular data set is needed, rather than load them all during import. But the key point here is that I use the __file__ constant to get the path to the __init__.py file, which I then use to figure out where the data files are.

arsenovic commented 7 years ago

ok, my usage scenario was a little different.

on my machine init'ing a 6D GA is slow, and 8D really slow. also, i like having fixed ga's, like from clifford import g300 for euclidean 3-space. currently these are implemented via hand-written sub-module stubs. perhaps this method is good enough, but the idea behind a cache was to

dynamically generate predefined ga submodules
speed up [user defined] big algebra init's

i am now working on a config file, so that a user could add their own predifined signtures to a list of fixed ones. kind of like matplotlib.styles. so, there will be

static : g200, g300, g400, g310, g130, g520
user defined : g600,g800, [whatever]

reloading the cache should take seconds to minutes, so reloading is no big deal (making pickle ok) , but you dont want to do it in every notebook.

all that being said, if 99% of people just want G2,G3 and their CGA's this effort is a waste of time. however, i think supporting arbitrary GA's will be very beneficial in the future.

hugohadfield commented 7 years ago

@arsenovic I've just pushed another change into hugo/performance that jits some of the generation code, on my laptop i get 33.7s from master to 7.5s for clifford.Cl(8). The overhead of jitting might cause it to be slower for low dimension algebras though..

arsenovic commented 6 years ago

this feature has stalled due to lack of need, and thus low priority. i think its a decent idea and worth keeping open for the future.

hugohadfield commented 6 years ago

I think this is a high priority issue, lots of people are using high dimension algebras and the slow initialisation really hits hard. The slowness really comes in on generating the sparse multiplication tables and storing these in memory. I think we need to do a couple of things here: Remove all references to the multiplication tables themselves Pre compute a wide range of algebras sparse tables and filter to get only the non zero elements, store these, probably with the numpy file format or something similar Load the saved sparse objects on algebra creation

arsenovic commented 6 years ago

i accidentally added this to the oo PR, but this was my first attempt at the caching.

https://github.com/arsenovic/clifford/blob/oo_cga/clifford/caching.py

pygae / clifford

algebra caching idea #17