spc-group / haven

Bluesky tools for beamlines managed by the spectroscopy group
https://haven-spc.readthedocs.io/en/latest/
Other
2 stars 5 forks source link

Image saving issue for area detectors #160

Open Cathyhjj opened 6 months ago

Cathyhjj commented 6 months ago

Current bluesky plans save all area detector's images from one scan into a single .h5 file:

If the scan time is long, e.g., a grid scan that take 2 hours in total, I had to wait to see the data until the entire scan is finished. And the .h5 file generated before the scan finishes is not readable. So if the scan is interrupted, I will lose all my data.

I would prefer to have an option to save each detector image every time when detector is triggered, instead of the scan is finished. This way helps to evaluate data on the fly and prevents data loss.

canismarko commented 6 months ago

@Cathyhjj Thanks for submitting this. Is this something you know how to do in EPICS? If so, it should be doable in Haven.

In an ideal world, what would the interface look like for using this option in the case of:

  1. Plans run through the Firefly GUI?
  2. Plans run directly on the queue server?
  3. Plans run in an IPython terminal or script?

However, if we could solve those two problems, would you still prefer to have individual files?

I had to wait to see the data until the entire scan is finished.

There is a feature in the HDF5 specification called single-writer-multiple-readers (SWMR) that solves this problem. We could see if this is implemented in the epics AD HDFWriter plugin.

So if the scan is interrupted, I will lose all my data.

Have you actually checked this? During writing, the file is open and so cannot be accessed by other applications, but that doesn't necessarily mean it will be corrupted if the scan fails.

Cathyhjj commented 6 months ago

We could save individual files when changing EPICS detector setting write mode into capture mode. However, we noticed that, during a firefly grid scan, although the live viewer (image J) is showing every image at each scan point, and autosave is turned on, but only one HDF5 is saved at the beginning of the scan.

To solve that, you modified the ophyd Device for the lambda detectors so that the capture mode gets set to "Stream mode" during staging. This saves all hdf images into one single hdf image at the end of the scan.

In an ideal world, what would the interface look like for using this option in the case of: Plans run through the Firefly GUI? Plans run directly on the queue server? Plans run in an IPython terminal or script? However, if we could solve those two problems, would you still prefer to have individual files?

I think all these options are good. But I would prefer the acquired images be viewed by specific software, e.g., dispersive XAS or miniXES.. unless these are integrated into Firefly (which can increase the complexity of Firefly and not sure that's favorable).

I still think it's nice to save individual images rather than the entire image stacks. For example, for a 1000 1000 grid scan (The 100 100 grid scan hdf I had was already 2.5 GB). Maybe >20 % of them are not useful. If it's individual files, users can only copy the 800*800 images that contain useful info. If it's just one file, one has to copy the whole big file. In addition, this might also significantly increase the computing time when loading the data into their own programs: if it's individual images, image processing can be done individually and data size can be minimized before loading all images into one program.

So if the scan is interrupted, I will lose all my data. Have you actually checked this? During writing, the file is open and so cannot be accessed by other applications, but that doesn't necessarily mean it will be corrupted if the scan fails.

I think you are right, the data is still there but cannot be accessed by other applications. In this case, I do think single-writer-multiple-readers (SWMR) is important to implement.

canismarko commented 6 months ago

I think all these options are good.

I meant, how would you select between the two modes (one file per scan vs one file per point) for each of those cases?

canismarko commented 6 months ago

I think you are right, the data is still there but cannot be accessed by other applications.

If I'm right, then once the file gets closed, the data become accessible.