Closed larsgeb closed 5 years ago
I'd like to think fairly big ;), i.e., to make this code suitable for high-dimensional problems, at least to some extent, so, I'd suggest to use a third strategy: write sequentially binary files.
This can be achieved, for instance, using the HDF5 format, which I've been using extensively. The h5py module is easy to use and works pretty well, and would allow us to save binary files with a filesystem-like structure inside: each sample is a "dataset" inside a single "big" file, along with all the other information we want to save, for instance, acceptance rate, potential and kinetic energy values, etc.. We would end up with a single file containing everything, but where it's easy to access a specific information without loading the entire file.
Usually I "thin" online a little bit, both to save space and for speed (I/O is slow...). Maybe we could have a user-defined parameter "save every N iterations"?
Both the file format and the tuning option seem like great ideas!
As an example on how I think we could use Github and to keep our email inboxes empty.
Should we write out samples sequentially or all at the same time?
Do we even want to store all samples, or should we thin on-line (keep only every n-th sample)?
I propose text files, and at the end of sampling convert them to .npy.