m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
426 stars 196 forks source link

Save to hdf5 before experiment is complete #505

Open dleibrandt opened 8 years ago

dleibrandt commented 8 years ago

As far as I can tell, the hdf5 files are not written until the end of an experiment. Is this correct?

If not, how do I save to hdf5 before the experiment is complete?

If so, I see this as a significant problem. We don't want to lose all the data if our computer crashes in the middle of a run. Also, we frequently run analysis scripts in the middle of experiment runs, which might be matlab or something else that doesn't play nice with artiq.

jordens commented 8 years ago

Correct.

You are free to open and write files to do checkpointing whenever you want. The code that does the hdf5 file writing is here: https://github.com/m-labs/artiq/blob/master/artiq/master/worker_impl.py#L230 You should be able to open that same file and write to it before write-results happens. IIRC the scheduler device knows about rid. Be aware that, if you want to read from another process with the hdf5 file being written to, single-writer-multiple-reader requires a bit of special care in h5py. If you need automatic checkpointing in the background we'd need a full specification.

jordens commented 8 years ago

http://docs.h5py.org/en/latest/swmr.html

sbourdeauducq commented 8 years ago

If you need automatic checkpointing in the background we'd need a full specification.

May not be fully in the background. An API such as Experiment.write_results() that is called automatically after Experiment.analyze() and may be called explicitly at any time by the user may be a good choice.

jordens commented 8 years ago

Yes. Or open the results file early (in prepare()) and expose the handle so that partial or intermediate things can be written without having to push all datasets.

dleibrandt commented 8 years ago

I think ideally I would like:

  1. All datasets are automatically written to the hdf5 near the beginning of the experiment (maybe at the end of prepare or the start of run?)
  2. Periodically, datasets that have been modified can be re-written via some user called function Experiment.update_hdf5() (ideally only the parts of the datasets that have actually been modified would be written for speed reasons)
  3. Experiment.update_hdf5() would be called automatically at the end of analyze()

If number 2 is too hard to implement, it might be OK if the user has the option to either write all the datasets or specify which datasets to write.

dnadlinger commented 4 years ago

1464 should improve the situation considerably by always saving to HDF5 once the run stage is reached, i.e. even if it is finished with an exception. If the user crashes/deadlocks the worker process before the exception handler runs, data can of course still be lost, so further checkpointing might still be a good idea.