ulrikpedersen / tomoscan

Demonstration of a tomography scan using EPICS BlueSky with a pulsed laser source
https://ulrikpedersen.github.io/tomoscan/
Apache License 2.0
1 stars 1 forks source link

Enable use of DataBroker in demo #11

Closed ulrikpedersen closed 1 year ago

ulrikpedersen commented 1 year ago

The DataBroker is a really useful feature - this is where the scientists really start to gain benefit from BlueSky. This does not currently work out-of-the-box but should feature as part of the demo.

DataBroker appears to be installed but it is not clear to me if it configured properly, if other services are required to run - or how to use it.

Figure out how to configure and use the databroker and add example instruction to the docs user tutorial.

8ryn commented 1 year ago

One step required to get this properly working is ensuring the correct path is given within the MyDetector class relative to bluesky. This is read_path_template as seen in https://github.com/8ryn/tomoscan/blob/databroker_dev/src/tomoscan/ophyd_inter_setup.py If running within a docker container it will be important to mount the output directory to the container when starting it.

A potentially useful utility when trying to enable the use of databroker is the ability to add callbacks when running a plan within the RunEngine, adding "print" as a callback prints the document stream for instance: RE(pulse_sync([det], motor1, laser1, -10, 10, 11),print)

This shows that a path to the saved file is at least included in the document stream although it does not appear to be saved in the run saved to the databroker catalog.

The previous run saved to the databroker catalog can be accessed as run = catalog[-1] or the run's uid can be used as the subscript. The xarray containing the saved data can then be read using run.primary.read()

ulrikpedersen commented 1 year ago

It seems there are a bunch of stuff not quite missing but not quite complete in BlueSky to make it easy to access detector data stored in (hdf5) files. However, Pete Jemian's apstools do try to fill some of these gaps and in particular this page on areaDetector with default HDF5 file name feels like a very close solution to this issue. I will have a go at these instructions.

ulrikpedersen commented 1 year ago

It seems there are a bunch of stuff not quite missing but not quite complete in BlueSky to make it easy to access detector data stored in (hdf5) files. However, Pete Jemian's apstools do try to fill some of these gaps and in particular this page on areaDetector with default HDF5 file name feels like a very close solution to this issue. I will have a go at these instructions.

Looks like at least some of apstools have merged upstream to ophyd. But it is not quite clear if everything has made it. For instance apstools provides this: from apstools.devices import SingleTrigger_V34.

Basically this file from apstools seems to have some fixes for recent versions of areaDetector although it is not clear if ophyd has caught up with AD since this was written...

ulrikpedersen commented 1 year ago

Sadly, I have not been able to make this work entirely. However, I feel it is pretty close as the DataBroker do seem to be aware where the data file is stored - only its data doesn't appear to be part of the dataset from the catalog. Here is a trace showing the issue (at least for my own recollection):

In [1]: uids = RE(pulse_sync([det], motor1, laser1, -10, 10, 11))

Transient Scan ID: 1     Time: 2023-07-26 15:03:19
Persistent Unique Scan ID: '461d65a1-e0df-4adf-bdda-35f70f355707'
New stream: 'primary'
+-----------+------------+------------+
|   seq_num |       time |     motor1 |
+-----------+------------+------------+
|         1 | 15:03:21.1 |        -10 |
|         2 | 15:03:22.0 |         -8 |
|         3 | 15:03:23.0 |         -6 |
|         4 | 15:03:24.0 |         -4 |
|         5 | 15:03:25.0 |         -2 |
|         6 | 15:03:26.0 |          0 |
|         7 | 15:03:27.0 |          2 |
|         8 | 15:03:28.0 |          4 |
|         9 | 15:03:29.0 |          6 |
|        10 | 15:03:30.0 |          8 |
|        11 | 15:03:31.0 |         10 |
+-----------+------------+------------+
generator pulse_sync ['461d65a1'] (scan num: 1)

In [2]: run = catalog.v2[uids[0]]
In [3]: run
Out[3]: 
BlueskyRun
  uid='461d65a1-e0df-4adf-bdda-35f70f355707'
  exit_status='success'
  2023-07-26 15:03:19.094 -- 2023-07-26 15:03:31.067
  Streams:
    * primary
In [4]: ds = run.primary.read()
In [5]: ds
Out[5]: 
<xarray.Dataset>
Dimensions:               (time: 11)
Coordinates:
  * time                  (time) float64 1.69e+09 1.69e+09 ... 1.69e+09 1.69e+09
Data variables:                                                         <------ Detector image dataset with 3 dims missing from the variables list!
    motor1                (time) float64 -10.0 -8.0 -6.0 -4.0 ... 6.0 8.0 10.0
    motor1_user_setpoint  (time) float64 -10.0 -8.0 -6.0 -4.0 ... 6.0 8.0 10.0
    laser1_power          (time) int64 0 0 0 0 0 0 0 0 0 0 0
    laser1_pulse_id       (time) float64 4.281e+11 4.281e+11 ... 4.281e+11
In [6]: run.primary._resources
Out[6]: 
[Resource({'path_semantics': 'posix',
 'resource_kwargs': {'frame_per_point': 1},
 'resource_path': 'data/2023/07/26/33f61ce8-09ed-4549-afc5_000000.h5',     <---- This is the correct path in the BlueSky container!
 'root': '/',
 'run_start': '461d65a1-e0df-4adf-bdda-35f70f355707',
 'spec': 'AD_HDF5',
 'uid': 'a794c396-617a-4e54-b61b-44301203d438'})]
In [7]: ls /data/2023/07/26/33f61ce8*
/data/2023/07/26/33f61ce8-09ed-4549-afc5_000000.h5     <---- The file is indeed here and we can open it to see the dataset
In [8]: import h5py
In [9]: f = h5py.File('/data/2023/07/26/33f61ce8-09ed-4549-afc5_000000.h5', 'r')
In [10]: f['entry/data/data']
Out[10]: <HDF5 dataset "data": shape (11, 1024, 1024), type "|u1">
In [11]: f['entry/data/data'][0]
Out[11]: 
array([[48, 48, 39, ..., 10, 34, 41],
       [64, 60, 36, ..., 39, 43, 29],
       [58, 51, 53, ..., 27,  3, 30],
       ...,
       [46, 23,  7, ...,  4, 40, 43],
       [29, 16, 22, ...,  5,  0, 43],
       [47, 10, 34, ..., 11,  2, 22]], dtype=uint8)

For a moment I thought I had found the issue and needed to install the correct handlers: pip install area-detector-handlers but that didn't seem to help.

8ryn commented 1 year ago

During work on #20 I found that it was necessary to set the hdf plugin's "kind" in order for the images to be included in the bluesky run documents. This resolves this issue. This small change along with adding instructions to mount the output directory to the Docker container are included in pull request #21

8ryn commented 1 year ago

With the merging of #21 the documents should be updated to include some instructions on accessing the image data before we close this issue.