janelia-cellmap / fibsem-metadata

Human-readable descriptions of our datasets
1 stars 4 forks source link

Consider adding curated OpenOrganelle data here #31

Open tlambert03 opened 2 years ago

tlambert03 commented 2 years ago

Hey @d-v-b and @avweigel! Finally started digging into this dataset... just amazing. Thank you all so much for making this heroic effort so easily available! Also appreciate the ease of loading into xarray with metadata from fibsem-tools 👍

I found myself digging into openorganelle source code to find a bit of info, and I wonder if you'd consider putting some of that metadata in this repo, for ease of querying (outside of OpenOrganelle).

  1. There's some good stuff in Organelles.tsx. I wonder if you might consider an api/organelles.json file to facilitate programmatic access to the table at https://openorganelle.janelia.org/organelles? Something like:

    ```json [ { "full_name": "Centrosome", "short_name": "Centrosome", "file_name": "cent", "description": "Barrel-shaped structure composed of microtubule triplets. Centrioles are often found in pairs and microtubule staining is dark and distinct. A cross section of centriole ends is a round, nine-fold star shape. Skeleton annotations in BigCat were used to trace each microtubule of the barrel structure. These skeletons were then used to inpaint a full microtubule triplet into the volume. Voxel classification was used to annotate distal (D App) and subdistal appendages (SD App).", "examples": [ { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [24834.125, 1468.32226562, 14897.5], "scale": 6.834771272276192, "dataset": "jrc_hela-2" }, { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [28310.619140625, 962.44360352, 10741.5], "scale": 2.671353019658504, "dataset": "jrc_hela-3" }, { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [5523.84423828125, 5789.76855469, 15452.810546875], "scale": 3.6039231119383004, "dataset": "jrc_jurkat-1" }, { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [21723.33984375, 3200.5, 20980.5], "scale": 5.430455441247891, "dataset": "jrc_macrophage-2" } ] }, { "full_name": "Centrosome Distal Appendage", "short_name": "Centrosome SD App", "file_name": "cent-sdapp", "description": "", "examples": [] }, { "full_name": "Chromatin", "short_name": "Chromatin", "file_name": "chrom", "description": "Protein and DNA complexes within the nucleus.", "examples": [ { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [25878.65625, 3357.13305664, 16238.5], "scale": 11.045498897968907, "dataset": "jrc_hela-2" }, { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [20657.3203125, 1713.79736328, 19444.5], "scale": 7.940871305346032, "dataset": "jrc_hela-3" }, { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [18039.82421875, 5265.18310547, 18075.4453125], "scale": 7.3670916293080895, "dataset": "jrc_jurkat-1" }, { "name": "", "description": "", "sources": ["fibsem-uint8"], "orientation": null, "position": [13386.9228515625, 2698.18066406, 18624.5], "scale": 6.8347712722761855, "dataset": "jrc_macrophage-2" } ] } ] ```
  2. I found myself digging a bit for a general key of the sources names, i.e something like:

    {
    "cent": "Centrosome",
    "cent-sdapp": "Centrosome Distal Appendage",
    "chrom": "Chromatin",
    "echrom": "Euchromatin",
    "hchrom": "Heterochromatin",
    "er": "Endoplasmic Reticulum",
    }

    I know that all of that info is repeated inside of each manifest.sources, but is there a better place that one could go to generally see what organelles and labels are available "somewhere" in the datasets and what each key means?

thanks again!

d-v-b commented 2 years ago

Hey Talley, thanks for looking into our data!

There's some good stuff in Organelles.tsx. I wonder if you might consider an api/organelles.json file to facilitate programmatic access to the table at https://openorganelle.janelia.org/organelles?

As you note, putting all that data inOrganelles.tsx is... less than ideal, to put it mildly. Totally agree that this info should be persisted externally, e.g. in this repo (which is a pseudo-API) or as an endpoint for a real API.

I know that all of that info is repeated inside of each manifest.sources, but is there a better place that one could go to generally see what organelles and labels are available "somewhere" in the datasets and what each key means?

At the moment no, but this will be fixed when I complete the work of this branch: https://github.com/janelia-cosem/fibsem-metadata/tree/fastapi

The plan is to put all this metadata in a postgresql database (making it queryable) and expose that with fastapi (making it documented and discoverable). Openorganelle will then use this API to discover and sort datasets.

Your second concern would be addressed even more comprehensively by expanding the metadata for individual volumes, e.g. giving each volume within a dataset its own description field, which would basically give some human-comprehensible explanation for a name like er.

Thanks again for the feedback! This is super helpful, and coming at the perfect time :)

tlambert03 commented 2 years ago

oooh ❤️ fantastic. happy to wait for that stuff. looks like an ideal long term solution.

(side note, did you happen to see https://sqlmodel.tiangolo.com/ ? ... I've been wanting to redo some older database projects after seeing how lovely that is 😄 )

d-v-b commented 2 years ago

ohhh wow I had not seen that... looks like my googling "pydantic + sql" didn't uncover it. I will definitely look into it. This is my first time doing anything with a database so there's a lot of learning in involved, but it's long overdue :)