Closed timpalpant closed 8 years ago
I want to learn about Pickling, I'll write this if you don't have it done already!
Go for it! I haven't started anything and it would be super helpful.
First attempt pickling a ClueDB has failed - pickle/cPickle apparently both struggle with lambda functions? And ClueDB._clue_to_answers is a defaultdict initialized with a lambda function. Looking for solutions to this now
Temporary fix by replacing the lambda with an importable function works and has let me do some speed testing. On my machine, loading a database takes... well, see for yourself
Test: Loading Loading times: mean=67.9141685009, std=3.17116179008, median=68.6186280251
Test: Unpickling (highest priority) Loading times: mean=41.9816338062, std=6.7354113375, median=42.4331450462
Run with 10 trials, consistency is good enough that that's more than enough. Definitely an improvement over loading, but not a huge one - I'll look for something better and if I don't find it, pickling it is and we can think about how to fix the lambda problem.
Edit: have not checked to see if pickling and unpickling a ClueDB gives the same ClueDB you started with because eq is not well-defined for ClueDBs. Not worrying about that yet, will later.
We have a new challenger (and a likely winner, once I write the tests that check to make sure the serialization works):
Test: Loading Loading times: mean=65.3548296293, std=0.95613521012, median=66.0260429382
Test: MessagePack Loading times: mean=17.1421430111, std=3.55922264153, median=15.5907959938
Test: Unpickle Loading times: mean=27.8242882888, std=3.72844386529, median=27.577742815
And in terms of the size:
MessagePack is much more limited than Pickle in terms of what it can serialize, so you have to write some custom prep code to get the class ready to serialize (and likewise to deserialize it). But for ClueDB it's pretty simple code, at least for now. A few more tests to write and then it'll be ready to pull.
Merged #14 -- Much faster, nice job finding MessagePack!
Resolved in #14 - can we close? Look at that
During startup we load several databases into memory to use for solving, such as the clue DB and a dictionary. Loading these from text files is slow (~45s).
Pre-process these resources into a format that can be loaded more quickly (maybe just a Pickle file). It will save time in the long run.