While writing tests for the incremental IO of metadata I added tests also for those metadata that we originally planned in the design doc for points but that we have only rarely used.
which specify the name of two columns with particular semantic:
the feature_key column is an optional column that contains the "main feature" of the points, ideally the gene_id;
the instance_key column is an optional column that can be used to tell which point fall into which cell.
The two columns above are very rarely used: I think that we never use instance_key and that feature_key is used in aggregate() to infer a default value for value_key when this is not specified by the user.
Since the practical rare usage I would consider simplify the points model and drop them; I think the effort to support and document a more complex data model is not paid off in practical terms in these cases. And while it's just little pieces of metadata, in general I think the leaner the model the better.
To be more precise, I have a preference for dropping both but I have no strong opinions for feature_key. Instead, I believe that we should remove instance_key for the following reasons:
it was added before multiple tables: now instance_key alone is not indicative of which table would contain the cells that the points refer to;
it was added before a table could annotate points: now, if a table annotates point, the instance_key of the table would refer to the index of the points, while the instance_key of the points would refer to the instance_key column of a different table.
Finally, an argument for dropping feature_key is that if we unify points and circles, either both should have the extra column either none, and since it's not something need for circles and it is not essential for points, I would just drop it.
An example of complications in maintenance, I found out that the parser discards these two metadata if they are already present in the object, unlike transformations.
While writing tests for the incremental IO of metadata I added tests also for those metadata that we originally planned in the design doc for points but that we have only rarely used.
Specifically, points have two optional values
which specify the name of two columns with particular semantic:
feature_key
column is an optional column that contains the "main feature" of the points, ideally the gene_id;instance_key
column is an optional column that can be used to tell which point fall into which cell.The two columns above are very rarely used: I think that we never use
instance_key
and thatfeature_key
is used inaggregate()
to infer a default value forvalue_key
when this is not specified by the user.Since the practical rare usage I would consider simplify the points model and drop them; I think the effort to support and document a more complex data model is not paid off in practical terms in these cases. And while it's just little pieces of metadata, in general I think the leaner the model the better.
To be more precise, I have a preference for dropping both but I have no strong opinions for
feature_key
. Instead, I believe that we should removeinstance_key
for the following reasons:Finally, an argument for dropping
feature_key
is that if we unify points and circles, either both should have the extra column either none, and since it's not something need for circles and it is not essential for points, I would just drop it.CC @giovp @kevinyamauchi @melonora