scverse / spatialdata

An open and interoperable data framework for spatial omics data
https://spatialdata.scverse.org/
BSD 3-Clause "New" or "Revised" License
174 stars 34 forks source link

Proposal: simplify data model for Points #503

Open LucaMarconato opened 1 month ago

LucaMarconato commented 1 month ago

While writing tests for the incremental IO of metadata I added tests also for those metadata that we originally planned in the design doc for points but that we have only rarely used.

Specifically, points have two optional values

element.attrs[PointsModel.ATTRS_KEY][PointsModel.FEATURE_KEY] = 'genes'
element.attrs[PointsModel.ATTRS_KEY][PointsModel.INSTANCE_KEY] = 'instance_id'

which specify the name of two columns with particular semantic:

The two columns above are very rarely used: I think that we never use instance_key and that feature_key is used in aggregate() to infer a default value for value_key when this is not specified by the user.

Since the practical rare usage I would consider simplify the points model and drop them; I think the effort to support and document a more complex data model is not paid off in practical terms in these cases. And while it's just little pieces of metadata, in general I think the leaner the model the better.

To be more precise, I have a preference for dropping both but I have no strong opinions for feature_key. Instead, I believe that we should remove instance_key for the following reasons:

Finally, an argument for dropping feature_key is that if we unify points and circles, either both should have the extra column either none, and since it's not something need for circles and it is not essential for points, I would just drop it.

CC @giovp @kevinyamauchi @melonora

LucaMarconato commented 1 month ago

An example of complications in maintenance, I found out that the parser discards these two metadata if they are already present in the object, unlike transformations.