Open nicholas-leonard opened 9 years ago
Such a solution (credit mostly goes to @dwf, actually) has also been implemented recently in Blocks by @bartvm, if you want to have a look.
We call the on-the-fly preprocessors data streams, while the datasets themselves are immutable. For a lengthier discussion on how we do checkpointing you can look here.
@bartvm very nice package this Blocks. I am definitely using it as a reference point. Love the doc. Thanks.
Found an intermediate/quickfix solution (for checkpoints) : https://github.com/nicholas-leonard/dp/commit/bbeeeab4f7ef15a931cbcc94a3778e839071a6d8
datasource = torch.checkpoint(checkpointPath, function()
return dp.Mnist{input_preprocess=input_preprocess}
end)
Quoting a recent discussion concerning pylearn2, Pascal Lamblin (@lamblin) offered some nice solutions to a problem both our libraries are having:
While these solutions are offered for pylearn2, they also concern dp. The Preprocess objects currently modify the DataSets inplace. Currently, preprocessing has to be done each time you run an experiment. But you could easily do it once, and reuse that Checkpoint for your experiments. All you would need is a script to create the checkpoint and a means of referring to the resulting files from you experiment.