jbornschein / draw

Reimplementation of DRAW
MIT License
347 stars 86 forks source link

Checkpoint fix #18

Closed dribnet closed 9 years ago

dribnet commented 9 years ago

Fixes jbornschein/draw#16

These changes are mainly to update checkpointing and fix jbornschein/draw#16. The issue was traced back to the introduction of a new pickler in blocks - mila-udem/blocks#615. The simplest fix for now is to skip pickling of the main_loop with this new pickler while still pickling the individual parts such as the model and log, and a new Checkpoint subclass is used in train-draw to achieve this.

I actually implemented a more heavyweight fix which does allow pickling of the main class (2393f0d1), but it was very, very slow. In testing on my machine it would take ~30 minutes to read the pickled file from disk and when running train-draw.py, more time was spent in the checkpoint code than in training. The output file was also an order of magnitude larger than the model file - not sure what all is getting stored in there. Since we only need the model itself as input to train-draw via --oldmodel or sample.py via model_file, decided on this much simpler and faster solution.

Note also that the blocks team is still futzing with serialization in mila-udem/blocks#685, so I might file an issue about this Checkpointing failure and suggest an argument to the Checkpointing class to skip pickling the main_class object so that we no longer need this custom subclass.

jbornschein commented 9 years ago

Nice, will merge!

Do you still need that specific blocks version? I thought draw@c351c501c75d4cdef5c6357c8d1fae4811b09976 fixed that problem?

dribnet commented 9 years ago

Yes - c351c50 did fix #17 for me. But I thought it would better future-proof this project to also include a note on what version of blocks this project was tested against since it seems the API on the master branch of blocks changes frequently. I tried to say something to this effect in the README, but feel free to reword or remove if you think this is confusing.

Hopefully there will be another stable versioned of blocks soon and arguably we should transition to that to better insulate ourselves from upstream changes.

BTW: Now that checkpointing out of the way, I think I have a straightforward solution to #15 (testing now)