haniffalab / webatlas-pipeline

A data pipeline built in Nextflow to process spatial and single-cell experiment data for visualisation in WebAtlas
MIT License
46 stars 10 forks source link

Image metadata output #124

Open davehorsfall opened 1 year ago

davehorsfall commented 1 year ago

The pipeline outputs image data (dimension, shape, etc): {"dimOrder": "XYZCT", "channel_names": [], "X": "2079", "Y": "1514", "Z": "1", "C": "3", "T": "1"}

Is this needed, and can we presenting it to the user in a cleaner and more user-friendly format?

dannda commented 1 year ago

That's done to avoid creating a separate file that contains only that data. The process to build the config file needs that metadata but only gets passed output file paths as string values so it cannot access the output itself to get the data (this was originally done to allow writing configs with s3 paths without downloading them, but we haven't really developed for those use cases since then, so this is not really a requirement anymore). Thus the ome_zarr_metadata.py script outputs the metadata to standard output and the process captures it instead of writing it to a new file. So, basically, to share a short string value between processes. I agree it is a confusing output. We could either

  1. Write it to a file
  2. Pass the xml file path as a file so the build_config process can access it
  3. Pass the zarr paths as files to the build_config