single-cell-data / TileDB-SOMA

Python and R SOMA APIs using TileDB’s cloud-native format. Ideal for single-cell data at any scale.
https://tiledbsoma.readthedocs.io
MIT License
90 stars 25 forks source link

[r] Use `libtiledbsoma` for array handles #3061

Open johnkerl opened 1 month ago

johnkerl commented 1 month ago

Currently we're retaining a tiledb-r handle at open: https://github.com/single-cell-data/TileDB-SOMA/blob/11d53967ebbdcab3070ef94e0a0a7ac590dfafac/apis/r/R/TileDBArray.R#L31

And we use a temporary open-use-close at every single call to libtiledbsoma -- here is just one of many examples: https://github.com/single-cell-data/TileDB-SOMA/blob/11d53967ebbdcab3070ef94e0a0a7ac590dfafac/apis/r/src/metadata.cpp#L170-L193

On the one hand this might seem lower-pri: the redundant opens are a perf hit but they work.

But as discussed on #3060 we must do this in order to remove the tiledb-r dependency.

See also #3053 which @nguyenv is working on -- this is a case where we do currently require a second open for array reads.

From #3059:

Already done for groups

2406; [sc-55685].

Steps:

To check:

johnkerl commented 2 weeks ago

Blocked by #3051 since here https://github.com/single-cell-data/TileDB-SOMA/blob/1.15.0rc2/apis/r/R/SOMADataFrame.R#L182 the self$object is used. Removing that private$.tiledb_array / self$object from the base class TileDBArray.R causes this query-condition logic to return empty results, in a way that makes unit-test cases fail in non-obvious ways.

We need to move the query-condition logic over from TileDB-R before we can continue here.

viviannguyen commented 2 weeks ago

I don't believe I'm the correct Vivian to be tagged here. You're probably looking for @nguyenv.

johnkerl commented 2 weeks ago

Oh no @viviannguyen sorry about that!! Have a nice day though!!