single-cell-data / SOMA

A flexible and extensible API for annotated 2D matrix data stored in multiple underlying formats.
MIT License
70 stars 9 forks source link

Rework object-creation and collection-building protocols #96

Closed thetorpedodog closed 1 year ago

thetorpedodog commented 1 year ago

This is (the specification side of) the plan for reworking the way we do creating and building complex SOMA objects. Currently, the primary model in TileDB-SOMA is to create the "leaf" objects, then work your way out to creating the "root" objects. This change is intended to better support a top-down creation model, where a user can create (for example) an Experiment, then create the members inside of it:

new_exp = soma_impl.Experiment.create("uri/of/new/experiment", ...)
# new_exp is opened in write mode

new_exp.add_dataframe("obs", ...)
# an `obs` dataframe is created inside the experiment

# The user can continue to build down into the experiment,
# or any other collection they have opened in write mode.

It also includes a changed flow for opening SOMA objects, where the SOMA object should always refer to a known object, rather than potentially being a free-floating URI:

my_coll = soma_impl.Collection.open("/path/to/whatever", ...)
# my_coll is an active, opened collection in read mode.

This is also intended to enable a base "smart open" function in the SOMA implementation:

some_obj = soma_impl.open("/path/to/any/soma_object", ...)
# the type of some_obj is identified and an instance of the appropriate
# concrete class is returned.

Tracking issue: https://github.com/single-cell-data/TileDB-SOMA/issues/638

thetorpedodog commented 1 year ago

@mlin Once we start implementation work on this, it will interact with the work you’re doing right now. I plan to wait until what you have going on is complete and then adapt from there, and I am planning on doing most of the refactoring to support this. I don't think your work should need major changes.