mehta-lab / recOrder

3D quantitative label-free imaging with phase and polarization
BSD 3-Clause "New" or "Revised" License
13 stars 4 forks source link

`recOrder` data schema #421

Closed talonchandler closed 1 year ago

talonchandler commented 1 year ago

Fixes #371

This PR describes recOrder's current storage schema for discussion. Let's iterate on the schema here, then after this merges I will modify recOrder to match the schema.

Here's a list of the issues I think should be addressed together with my suggestions:

codecov[bot] commented 1 year ago

Codecov Report

Merging #421 (3e9bb12) into main (7f57e33) will increase coverage by 0.09%. Report is 4 commits behind head on main. The diff coverage is n/a.

:exclamation: Current head 3e9bb12 differs from pull request most recent head 1941118. Consider uploading reports for the commit 1941118 to get more accurate results

@@           Coverage Diff            @@
##            main    #421      +/-   ##
========================================
+ Coverage   8.30%   8.40%   +0.09%     
========================================
  Files         26      26              
  Lines       4527    4475      -52     
========================================
  Hits         376     376              
+ Misses      4151    4099      -52     

see 4 files with indirect coverage changes

deprecated-napari-hub-preview-bot[bot] commented 1 year ago

Preview page for your plugin is ready here: https://preview.napari-hub.org/mehta-lab/recOrder/421 Updated: 2023-08-21T22:54:22.880813

mattersoflight commented 1 year ago

Those are good suggestions @talonchandler . We can simplify the schema even further.

talonchandler commented 1 year ago

Thanks @mattersoflight. I've made edits on this PR for my best-effort attempt to simplify further.

bg_.zarr and _.zarr can contain raw and reconstructed data as channels...eliminate one hierarchy level.

This is a challenging change for a couple reasons:

At the expense of the extra hierarchy level, I think the current design is working reasonably well so I will vote to stick with separate raw_data.zarr and reconstruction.zarr. Any thoughts @ziw-liu?

The calibration metadata, gui_state, and reconstruction_settings can be part of a single YML file, if that does not pose any challenges in workflow.

I think calibration metadata and reconstruction_settings need to stay separate for our workflows (also worthwhile considering a likely upcoming split of recOrder and a calibration plugin).

IMO gui_state has served its purpose as an interim reproducibility aid since we now have reconstruction_settings.yml files. @ziw-liu let me know if you have any objection to me removing it.

ziw-liu commented 1 year ago
  • recOrder's 2D reconstructions have different shapes for raw and reconstructed data, and IMO it would be confusing to mix 2D and 3D data in a single TCZYX array. We could think about adding them as different Arrays in the zarr store, but...

I agree that the main issue of having 2D reconstructions from 3D raw data in the same store is visualization. The napari-ome-zarr plugin doesn't support extra arrays other than resolution levels. And saving the 2D reconstruction as an extra channel with only Z=0 being filled would render as black for Z!=0, which is not entirely obvious or convenient.

ziw-liu commented 1 year ago

IMO gui_state has served its purpose as an interim reproducibility aid since we now have reconstruction_settings.yml files. @ziw-liu let me know if you have any objection to me removing it.

It was introduced as a duct-tape fix so no objection to replacing it with a proper implementation!

mattersoflight commented 1 year ago

At the expense of the extra hierarchy level, I think the current design is working reasonably well so I will vote to stick with separate raw_data.zarr and reconstruction.zarr.

Ah - a good point that reconstruction can be a different shape than raw_data. Makes sense to store raw_data.zarr and reconstruction.zarr as separate stores.

I think calibration metadata and reconstruction_settings need to stay separate for our workflows (also worthwhile considering a likely upcoming split of recOrder and a calibration plugin).

If a CLI needs both config files, the information is best kept in the same file. If a CLI/UI needs just one of the two pieces of information, it can just read that field. The unused field is a 'read-only metadata'. We had too many config files (preprocessing, training, prediction) in microDL that all needed to be carried around. I am trying to prevent a similar outcome.

Is calibration metadata created only for polarization-resolved acquisitions? If yes, it may make sense to keep the files separate. In that case, I would rename the file to polarization_calibration.txt

I'll let you decide. Approving.

talonchandler commented 1 year ago

Is calibration metadata created only for polarization-resolved acquisitions? If yes, it may make sense to keep the files separate. In that case, I would rename the file to polarization_calibration.txt

That's correct. The calibration is only for polarization-resolved acquisitions, so I like this renaming. Changing.

The polarization_calibration.txt file is not used as an input for the reconstruction pipelines, so I think it's appropriate to keep polarization_calibration.txt in a separate file. The only field that is needed by the reconstruction is the swing, so that field appears in both polarization_calibration.txt and reconstruction_settings.yml.

Thanks!