Closed femtobit closed 2 years ago
I've just signed for the Unitary-Hack and saw this issue. As it's something I've wanted to implement myself a while ago, I already have a basic HDF5 logger working. I would need to clean it up a little, make sure it does what is required from the description above and add some logic so that data gets flushed after a certain amount of steps, as is done in the JsonLog
. Is it ok if I do this and push a PR once it's ready?
that would be great, @PhilipVinc will tell you if you need to register specifically for this bounty and "reserve" it for you
HI @maxbortone , yes please pick up the bounty! Let me know if you need any guidance on it.
If you open a PR earlier rather than later it's easier for us to keep an eye on it, but feel free to work on it as you prefer.
All right, thanks! I'll prepare a PR today
[This issue is part of UnitaryHack and comes with a bounty of 75$]
Context
NetKet simulation drivers support output of the current state of an optimization as well as expectation values of observables and custom data via the classes provided in
netket.logging
.Currently, NetKet has two main logging implementations:
JsonLog
, which is the standard logger and writes log data to a JSON file (and also saves a regularly overwritten snapshot of the current network parameters as a MessagePack file).StateLog
, which saves intermediate network parameters as a separate file for each step (1.mpack
,2.mpack
,3.mpack
, etc.) to a folder or ZIP file.While these work, it would be nice for easier data handling and interoperability with other tools to support writing simulation output into a single file in the commonly used HDF5 format via
h5py
.Implementation notes
To resolve this issue, the following should be implemented:
netket.logging.HDF5Log
which writes both the information currently contained inJsonLog
and the network parameters (at each step or every certain number of steps as it can be configured inStateLog
) to an HDF5 file specified by the user.JsongLog
andStateLog
in NetKet drivers. (For this PR this means compatibility with the current use of the loggers innetket.driver.AbstractVariationalDriver
.) Specifically:__call__(self, step, log_data, state)
which is called at each optimization step and write the provided data to the HDF5 log file.__call__
) is not known before the start of the simulation, the log must support appending data to the log every at every step (therefore, the datasets within the HDF5 file need to be resized as necessary).Note that
log_data
is a dictionary mapping a name to a specific logged quantity. The value can be of several different types. TheHDF5Log
should support scalar numbers, NumPy/JAX arrays, andnetket.stats.Stats
objects.values
of shape(n_steps,)
containing the logged values and another datasetiters
of the same shape containing the value ofstep
at which each entry was logged (compareJsonLog
output).(n_steps, *array_shape)
.netket.stats.Stats
are essentially dataclasses with the fieldsmean
,variance
,error_of_mean
,tau_corr
,R_hat
. For a stats object, the log file should contain a group containing each field as a separate dataset (and aniters
field like above).NetKet stores network parameters as JAX pytrees with leaves being complex-valued or real-valued arrays. The
HDF5Log
should store a flattened version (as returned bynetket.jax.tree_ravel
) in a single dataset of shape(n_steps, n_parameters)
.Here is an example layout, showing what a resulting HDF5 log file should contain after 1001 logging steps for a network with 256 variational parameters:
Note that for a normal
netket.VMC
run, there is a lot of redundancy in the.../iters
arrays (as they will all be equal and of the form[0, 1, ..., n_steps - 1]
). We accept this overhead both for compatibility with the existingJsongLog
and for the added flexibility it provides for custom logging at subsets of steps.