Open bkmartinjr opened 11 months ago
@bkmartinjr thank for filing this.
The semantics of bounding-box we hammered out in several spec meetings and a design doc was definitely mean for intended shape. There were three:
So if we want to keep bounding box as intended shape we need to provide a way to get those numbers like 100 and 200 actually wired all the way from the user data to the on-disk bounding-box metadata.
And if we want to jettison the bounding box entirely then we just need to use non-empty domain for the performance issue raised in #1933.
Personally I can go either way. But the latter wins in a tie-breaker, for two reasons:
And I do mean "personally" -- I am not a compbio person and will happily defer to whatever a consensus of the compbio people wants.
Pending #2407
The current python "soma bounding box" for sparse nd array is inconsistent in its semantics in the write path:
for Tensor objects, it utilizes the shape of the tensor to compute the new bounding box. From the code:
for Table objects, it scans for the max coordinate, and writes the 'used' bounding box. From the code:
This will have different results when the tensor is sparse (empty) in the max coordinates. Code that will otherwise work fine breaks the bounding box -- for example, these two writes produce entirely different results:
It is not entirely clear which is correct, and depends on downstream uses. For example if the expectation is that a sparse matrix, with the last few rows set to zero, can be saved and reconstituted with the same shape, this difference will be an issue.
If the semantics are "used coordinates", as implemented in the Table code branch, we would be better off using the underlying
nonempty_domain
implementation (resolving bug #1969). If the semantics intend shape, then the Table code path needs some means of determining the caller's intended shape.CC: @mojaveazure @johnkerl