single-cell-data / TileDB-SOMA

Python and R SOMA APIs using TileDB’s cloud-native format. Ideal for single-cell data at any scale.
MIT License
81 stars 21 forks source link

[python] for an Experiment, `soma.open` is 3X slower than `soma.Experiment.open` #2726

Open bkmartinjr opened 1 month ago

bkmartinjr commented 1 month ago

On a TileDB Cloud URI that points to a SOMA Experiment, soma.open is substantially slower than soma.Experiment.open.

Ideally these would be the of similar performance, allowing the convenience of using the generic opener.

The use of TileDB-Py (e.g., tiledb.Group()) is even faster yet, and ideally would be the benchmark time.

Example:

In [40]: soma.__version__
Out[40]: '1.9.5'

In [41]: tiledb.__version__
Out[41]: '0.27.1'

In [52]: %time print(list(soma.open("tiledb://TileDB-Inc/8917a1ab-dd51-44b4-999c-cce5321adcdf", context=ctx)))
['ms', 'obs']
CPU times: user 89 ms, sys: 23.9 ms, total: 113 ms
Wall time: 3.4 s

In [53]: %time print(list(soma.Experiment.open("tiledb://TileDB-Inc/8917a1ab-dd51-44b4-999c-cce5321adcdf", context=ctx)))
['ms', 'obs']
CPU times: user 41.5 ms, sys: 9.12 ms, total: 50.6 ms
Wall time: 1.12 s

In [54]: %time print(list(tiledb.Group("tiledb://TileDB-Inc/8917a1ab-dd51-44b4-999c-cce5321adcdf", ctx=tiledb.cloud.Ctx())))
[Obj<GROUP "tiledb://TileDB-Inc/9b79dd7e-5f10-46fe-a229-00a40be387f1" - "ms">, Obj<ARRAY "tiledb://TileDB-Inc/ccc2103f-2950-4dbf-9fc8-568fb59e01dc" - "obs">]
CPU times: user 39.8 ms, sys: 7.98 ms, total: 47.8 ms
Wall time: 871 ms
In [59]: soma.show_package_versions()
tiledbsoma.__version__        1.9.5
TileDB-Py tiledb.version()    (0, 27, 1)
TileDB core version           2.21.1
libtiledbsoma version()       libtiledb=2.21.1
python version                3.10.12.final.0
OS version                    Linux 6.8.0-76060800daily20240311-generic
johnkerl commented 1 month ago

On investigation:

bkmartinjr commented 4 weeks ago

FYI: @ypatia - latency-related (this case is not proxied, but demonstrates another access pattern that is latency-sensitive)

johnkerl commented 4 weeks ago

@ypatia @bkmartinjr I have a write-up not written down yet -- fully analyzed in my head & in scratch notes -- will follow up post-retreat ...