silx-kit / silx

silx toolkit
http://www.silx.org/doc/silx/latest/
MIT License
119 stars 70 forks source link

Add `Tiled` data sources to browse Bluesky runs #4106

Open padraic-shafer opened 2 months ago

padraic-shafer commented 2 months ago

There has been a recent burst of interest and activity in integrating bluesky/tiled data sources into silx (and by extension into pyMCA). I'm summarizing some of those discussions here to get feedback from silx developers.

Background

Several light sources are looking into using pyMCA as a browser of data collected during Bluesky runs. From discussions with @linupi @vasole @t20100 it was suggested that modifying silx to accept a Tiled data catalog would be an elegant way to do this for pyMCA, silx-view, and any other apps depending on silx.

@t20100 has started a proof-of-concept branch that shows a pathway for adapting a Tiled Container to a HDF5-like interface.


Preliminary scope (to be refined)

Discussion on 2024-03-26 @whs92 @danielballan @abbygi @vshekar @padraic-shafer [...missing handles for more BESSY-II participants]

During a chat between several developers at NSLS-II and BESSY-II, we recognized a common interest in using pyMCA as a "bluesky-supported" visual explorer of Tiled datasets for beamline experimenters. We identified several preliminary goals for a development sprint.

  1. Connect to a tiled server over HTTP -- Accept a URL; handle Auth
  2. Browse contents, with ability to filter and sort
    • Should identify bluesky runs
    • Will likely need a per-endstation configuration of metadata "projections" (flattened subset of important metadata)
  3. View baseline data for selected run(s)
  4. Plot scan data using existing plot tools
    • Use hinted data by default
    • User can assign "any" channel to a plot axis
  5. "Live plot" of data being captured
    • More than one bluesky run may be active at once (nested scans)
    • Initially target a poling loop ~1 second
    • Leave a path open to tiled-stream / websocket
    • Must be able to resume viewing a scan-in-progress if client restarts

Refined goals

Discussion on 2024-04-09 @whs92 @danielballan @abbygi @vshekar @padraic-shafer

  1. Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
  2. Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
  3. Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
  4. Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
  5. Ensure HTTP I/O does not lock up or crash the app.
padraic-shafer commented 2 months ago

Refined goals

Discussion on 2024-04-09 @whs92 @danielballan @AbbyGi @vshekar @padraic-shafer

  1. Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
  2. Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
  3. Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
  4. Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
  5. Ensure HTTP I/O does not lock up or crash the app.

@vasole @t20100 @linupi Because we weren't able to find a suitable time yet for all of us to meet live--and it sounds like it might be a couple weeks until that's possible--what do you think about this approach? Do you foresee particular difficulties or incompatibilities in fitting this into the architecture of silx?

t20100 commented 2 months ago

Hi,

Thanks for the summary!

For the silx part, it makes sense to me and the proof-of-concept was very simple to implement. However, I still have a shallow understanding of tiled. For now my main concern would be point 5 "Ensure HTTP I/O does not lock up or crash the app." since the hdf5-like API and silx view are built around synchronous access to the data, and I'm not convinced this is easy to change.

t20100 commented 2 months ago

BTW, you might want to have a look at h5web, a web-based HDF5 data viewer my colleagues @axelboc and @loichuder developed and maintain. It is available as a JupyterLab extension, a VSCode extension and powers HDF5 online viewing of the ESRF "data portal" and the https://myhdf5.hdfgroup.org/ online viewer (thanks to h5wasm). This again aims at supporting HDF5 files but the access to the data is abstracted through Providers (for now there's 3 for the HDFGroup's HSDS server, h5wasm and our h5grove a small server tailored for h5web), so there may be a way to adapt it to tiled. As opposed to silx view, it's natively asynchronous.

danielballan commented 2 months ago

Thanks @t20100. I agree that the blocking I/O sounds like the hard part. We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.

Adding an h5web Provider for Tiled is also interesting. This has been on our radar since we opened an Issue in Tiled in September 2021. It might be about time to do it. One perhaps unique capability this could add is the ability to view specfiles, TIFFs, and other formats, which Tiled can serve through a unified HDF5-ish abstraction.

I think PyMca is serving a particular cluster of requirements though, so we would pursue this in addition to PyMca integration.

t20100 commented 2 months ago

We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.

Sounds good to me.

t20100 commented 2 months ago

I just made some update to the silx branch with basic tiled support, and opened PR #4121.

Compared to the previous poc version:

Feedbacks welcomed!

vasole commented 2 months ago

Just to comment that if the prefix is removed, it would simplify things at the PyMca side too because I had already foreseen to handle URLs exclusively via the silx abstraction.

t20100 commented 1 month ago

tiled- prefix removed. Also reworked the TiledDataset to inherit directly from commonh5.Dataset and added a tile Cache.