Closed bboe closed 10 years ago
Interesting! What are you pickling objects for?
I want to cache to disk parsed scratch files to improve processing time over the same files. This PR allows pickling files, but it may not allow loading the pickled files (I neglected to test that).
Yeah, it doesn't unpickle properly. Hold off on this PR until I get a fix for that.
PR updated. Here's the reason why I want this support:
(hb)bboe@lappy2:Downloads$ time python -c 'import kelp.octopi; import kurt; kurt.Project.load("tmp.oct")'
real 0m10.232s
user 0m10.084s
sys 0m0.122s
(hb)bboe@lappy2:Downloads$ time python -c "import cPickle; cPickle.load(open('/tmp/hairball_cache.pkl'))"
real 0m1.242s
user 0m1.159s
sys 0m0.078s
Ideally the speed improvements should be within kurt itself, but I haven't the time to work on that.
Just as a heads up, I've done a bit of testing now with this pickling support. The speed-up is tremendous.
https://github.com/ucsb-cs-education/hairball/compare/cache
I'm going through and pre-kurt loading all of my data which will take about three hours for the 1200 files. With the pickled cache I have added to my library (depends on this version of Kurt) I can process the already-cached portion of the dataset in only seconds.
The speed-up is tremendous.
Yes! Kurt's parser isn't terribly efficient.
I need to have a proper look at your pickling patch, and make sure it doesn't break stuff.
By the way, do your projects have large images? I think the 1.4 image-parsing code is possibly the bottleneck, and so needs rewriting anyway. If you could profile it to check, that'd be great.
By the way, do your projects have large images? I think the 1.4 image-parsing code is possibly the bottleneck, and so needs rewriting anyway. If you could profile it to check, that'd be great.
I don't think the images are particularly large. Regardless, I have my band-aid fix needed in order to more efficiently perform what I am working on. Thus I am respectfully going to decline your request for profiling.
Thus I am respectfully going to decline your request for profiling.
No worries! I should've clarified: I'll certainly merge your PR, once I've tested it. :)
Sorry for taking so long to merge this. (I got busy...)
Out of interest, does removing line 143 of scratch14/init.py solve the problems with requiring pickle.HIGHEST_PROTOCOL? It's only necessary for debugging.
Out of interest, does removing line 143 of scratch14/init.py solve the problems with requiring pickle.HIGHEST_PROTOCOL? It's only necessary for debugging.
I doubt it. Just out of curiosity, is there any other way to access the save history information other than through that attribute?
is there any other way to access the save history information
No, but feel free to raise an issue!
Added testing to verify simple projects can be pickled.
Note: At this time only pickle.HIGHEST_PROTOCOL is supported. To support older protocols, all classes that define getattr or slots need to be updated to include getdata and setdata. This requires changes or monkeypatches to construct.