Closed bkmartinjr closed 1 year ago
[sc-34502]
@bkmartinjr I verified in a Docker image just now that this repros with 1.4.3, i.e. not new to 1.5.0rc0. Also, repros with 1.3.0.
Analysis, for the record.
For context, here's what core does (driven by tiledb-py just for ease of operation):
>>> A = tiledb.open('s/v/pbmc3k/ms/RNA/X/data')
>>> A.df[[3],[4]]
soma_dim_0 soma_dim_1 soma_data
0 3 4 1.572679
>>> A.df[[3],[]]
soma_dim_0 soma_dim_1 soma_data
0 3 0 -0.285241
1 3 1 -0.281735
2 3 2 -0.052227
3 3 3 -0.484929
4 3 4 1.572679
... ... ... ...
1833 3 1833 -0.097402
1834 3 1834 1.631685
1835 3 1835 -0.119462
1836 3 1836 -0.179120
1837 3 1837 -0.505670
[1838 rows x 3 columns]
>>> A.df[[],[4]]
soma_dim_0 soma_dim_1 soma_data
0 0 4 -0.544024
1 1 4 0.633951
2 2 4 1.332647
3 3 4 1.572679
4 4 4 -0.333409
... ... ... ...
2633 2633 4 -0.666646
2634 2634 4 1.201865
2635 2635 4 2.193953
2636 2636 4 -0.350005
2637 2637 4 -0.457937
[2638 rows x 3 columns]
>>> A.df[[],[]]
soma_dim_0 soma_dim_1 soma_data
0 0 0 -0.171470
1 0 1 -0.280812
2 0 2 -0.046677
3 0 3 -0.475169
4 0 4 -0.544024
... ... ... ...
4848639 2637 1833 -0.103831
4848640 2637 1834 -0.436880
4848641 2637 1835 -0.078424
4848642 2637 1836 -0.130327
4848643 2637 1837 -0.471338
[4848644 rows x 3 columns]
Note in particular: in core, empty means all.
This seemed familiar to me, & as I looked through the tiledbsoma-cpp code, I realized why: we've been here before. Back in October of last year:
and then
When the data are one-dimensional, then this suffices:
This is because https://github.com/single-cell-data/TileDB-SOMA/blob/1.4.0/libtiledbsoma/src/soma/managed_query.h#L133
Namely:
subarray_range_empty_
starts out true[]
then it stays true[...]
then it becomes falseHowever, with two-dimensional data, this thing we did in October 2022 isn't sufficient:
subarray_range_empty_
starts out true[]
so it stays true[...]
so it becomes falseSo we need to track subarray_range_empty_
on a dim-by-dim basis in order to effectively work around core's empty-means-all semantics -- if any dim has empty range then override core & declare the result-set is empty. The learning now is that core's "empty means all" needs to be applied dim by dim. That's what this #1706 does.
The SparseNDArray.read method returns incorrect query results when the
coords
provided are a unique combination of:Concrete test case below. All three cases should identical results (namely, an empty pa.Table). The third case, which should be identical, instead returns all data in the SparseNDArray. Bug was found in the 1.5 pre-release (2.17), but it may exist in previous versions as well.
Bug was found via incorrect ExperimentAxisQuery.to_anndata results, but I narrowed it down to the SparseNDArray.read behavior;
Running it shows that the third case is incorrect (the first two return zero-length results, but the third case does not):