single-cell-data / SOMA

A flexible and extensible API for annotated 2D matrix data stored in multiple underlying formats.
MIT License
69 stars 9 forks source link

Proposed update to the spatial additions #196

Closed jp-dark closed 1 month ago

jp-dark commented 2 months ago

This PR provides a proposed restructuring to the spatial components.

jp-dark commented 2 months ago

Summary of proposed changes:

Top level changes:

Scene changes:

prathapsridharan commented 2 months ago
  • The spatial collection now only contains the scene. All catalog metadata gets bumped to the Experiment level.

@jp-dark -

  1. I might have missed it but where is the catalog metadata stored in at the Experiment level? I only see obs and obs_scene dataframes in Experiment.
  2. By "catalog metadata" are referring to metadata about scenes or something else?
  • Decision: Should scene_id be an integer soma_joinid or the names of the Scenes stored in spatial?

@jp-dark @pablo-gar - Correct me if I am wrong but my understanding of the a Collection is that it is a string keyed map. If spatial is such a collection and if it is to be keyed by a scene_id, then wouldn't the scene_id necessarily have to be a string?

jp-dark commented 2 months ago
  • The spatial collection now only contains the scene. All catalog metadata gets bumped to the Experiment level.
1. I might have missed it but where is the catalog metadata stored in at the Experiment level? I only see `obs` and `obs_scene` dataframes in `Experiment`.

2. By _"catalog metadata"_ are referring to metadata about _scenes_ or something else?

"Catalog metadata" was a poor descriptor on my part. I'm talking about any dataframes/arrays that describe the scenes and/or the relationships between scenes are other pieces of the Experiement (right now just obs_scene but we might want to add more).

  • Decision: Should scene_id be an integer soma_joinid or the names of the Scenes stored in spatial?

@jp-dark @pablo-gar - Correct me if I am wrong but my understanding of the a Collection is that it is a string keyed map. If spatial is such a collection and if it is to be keyed by a scene_id, then wouldn't the scene_id necessarily have to be a string?

As is, yes**. If we key on a soma_joinid, we would need to add a mapping from the scene name to the join ID.

** Groups in TileDB are actually primarily keyed by index with an optional string name, but I believe the SOMA implementation treats collections as string keyed maps.

prathapsridharan commented 1 month ago

@jp-dark (cc: @pablo-gar) - For the obsl, varl, and obssm and varssm some more detailed comments about their meaning would be helpful. If possible, an illustrative example would not be bad either - For instance, obsl contiains columns: obs_soma_joinid: int, geometry: SomaGeometry, etc.

Especially since we will inevitably be changing things, these comments will be helpful for the future.

@jp-dark - After you add the comments, let me know and I will approve this PR