Closed sherwoodf closed 1 month ago
@joshmoore @Tom-TBT @normanrz @francesw @matthewh-ebi
This is great, @sherwoodf! Thanks. Can you jot down a minimal workflow? (And when you would imagine it being called, e.g., before/after dev2/resave.py or perhaps it doesn't matter)
Can you jot down a minimal workflow? (And when you would imagine it being called, e.g., before/after dev2/resave.py or perhaps it doesn't matter)
Sorry, didn't get back to this:
In terms of workflow, the toml can be used with poetry to set up a local python env and run the script:
poetry install
poetry run create_fly_ro_crate_metadata.py
But for actual pipelines to update to the new version, i would expect users to write scripts similar to create_fly_ro_crate_metadata.py (i'm assuming the metadata doesn't already exist somewhere in the existing zarrs).
The standard ro-crate logic does a deep copy of files to create the crate along with this json. I suspect it would be more efficient to only create the additional ro-crate-metadata files after the dev2/resave.py (similar to what i did in the create_fly_ro_crate_metadata script) - otherwise it would be necessary to blend the ro-crate creation process along with the resave.py logic to avoid creating copies of all the new zarr files.
i would expect users to write scripts similar to create_fly_ro_crate_metadata.py
Agreed, but I imagine that for many of us, there will be too many scripts needed to easily create them all, and so, reading from a CSV or similar may be preferred.
I suspect it would be more efficient to only create the additional ro-crate-metadata files after the dev2/resave.py
:+1:
Once #10 is in, I'll open a PR with something like this:
(challenge4) ~/opt/challenge/ome2024-ngff-challenge/dev2 $git diff
diff --git a/dev2/resave.py b/dev2/resave.py
index 36b004f..35d81e9 100755
--- a/dev2/resave.py
+++ b/dev2/resave.py
@@ -292,6 +292,36 @@ def convert_image(
ds_shards,
)
+def write_rocrate(output_path: str):
+ from zarr_crate.zarr_extension import ZarrCrate
+ from zarr_crate.rembi_extension import Biosample, Specimen, ImageAcquistion
+
+ crate = ZarrCrate()
+
+ zarr_root = crate.add_dataset(
+ "./",
+ properties={
+ "name": "Light microscopy photo of a fly",
+ "description": "Light microscopy photo of a fruit fly.",
+ "licence": "https://creativecommons.org/licenses/by/4.0/",
+ },
+ )
+ biosample = crate.add(
+ Biosample(crate, properties={"organism_classification": {"@id": "NCBI:txid7227" }})
+ )
+ specimen = crate.add(Specimen(crate, biosample))
+ image_acquisition = crate.add(
+ ImageAcquistion(crate, specimen, properties={"fbbi_id": {"@id": "obo:FBbi_00000243"}})
+ )
+ zarr_root["resultOf"] = image_acquisition
+
+ metadata_dict = crate.metadata.generate()
+
+ filename = os.path.join(output_path, "ro-crate-metadata.json")
+ with open(filename, "w") as f:
+ f.write(json.dumps(metadata_dict, indent=2))
+
+
def main(ns: argparse.Namespace):
CONFIGS = create_configs(ns)
@@ -340,6 +370,7 @@ def main(ns: argparse.Namespace):
else:
write_store = STORES[1]
write_root = zarr.Group.create(write_store)
+ write_rocrate(ns.output_path)
# image...
if read_root.attrs.get("multiscales"):
We can either write dummy values or take those values as CLI/CSV input.
nvm. I put it behind an optional flag and merged in the main branch so it's available in #10 now.
Created example code of how to generate a ro-crate-metadata.json with some REMBI & zarr extensions to generate the context and expected object links.
I wanted to actually use some of the RO Crate tooling to see what it would create in case that helps make decisions about what the metadata would look like / work out what would be easy for us to override/extend to provide to zarr users.