single-cell-data / SOMA

A flexible and extensible API for annotated 2D matrix data stored in multiple underlying formats.
MIT License
69 stars 9 forks source link

Rework and enhance type hierarchy and generics #115

Closed thetorpedodog closed 1 year ago

thetorpedodog commented 1 year ago

Context: https://github.com/single-cell-data/TileDB-SOMA/issues/638 and https://github.com/single-cell-data/TileDB-SOMA/issues/540

While these new changes add a bunch of generic slots to the base collection types and experiments and measurements, the experience from the perspective of a SOMA library user will be roughly the same. That is to say, it's a little scary here, but the end user will still see theimpl.Collection[ElementType]. Type inference when using composed objects is better as well:

some_exp = theimpl.Experiment.open(...)

obs = some_exp.obs
reveal_type(obs)
# BEFORE: somacore.DataFrame
#         (i.e., the type system doesn't know what implementation
#         of the abstract DataFrame this is; it only knows about
#         the bare minimum DataFrame properties)
# AFTER:  theimpl.DataFrame

ms = some_exp.ms
reveal_type(ms)
# BEFORE: somacore.Collection[somacore.Measurement]
# AFTER:  theimpl.Collection[theimpl.Measurement]

some_meas = ms["whatever"]
reveal_type(ms)
# BEFORE: somacore.Measurement
# AFTER:  theimpl.Measurement

some_meas.X
reveal_type(ms)
# BEFORE: somacore.Collection[somacore.NDArray]
# AFTER:  theimpl.Collection[theimpl.NDArray]

There is no change at runtime; the actual types of the objects remain the same, but autocompletion, type checking, and other tooling has a much better idea of what is going on.


To show what this looks like on the tiledbsoma side, the diff is pretty small, but the key part is in io.py, where the cast(tiledbsoma.Measurement, ms[whatever]) no longer needs to happen, since the type system already knows it’s a tiledbsoma.Measurement. While that is the only change there specifically, there will be corresponding improvements in user code.

And just to reiterate: runtime behavior is identical, and any code which works now will continue to work, but static type inference is significantly improved.