SOMA – for “Stack Of Matrices, Annotated” – is a flexible, extensible, and open-source API enabling access to data in a variety of formats. The driving use case of SOMA is for single-cell data in the form of annotated matrices where observations are frequently cells and features are genes, proteins, or genomic regions.
The TileDB-SOMA package is a C++ library with APIs in Python and R, using TileDB Embedded to implement the SOMA specification.
Get started on using TileDB-SOMA:
Intended to be used for single-cell data, TileDB-SOMA provides Python and R APIs to allow for storage and data access patterns at scale and for larger-than-memory operations:
TileDB-SOMA provides interoperability with existing single-cell toolkits:
TileDB-SOMA provides interoperability with existing Python or R data structures:
#genomics
.#cellxgene-census-users
.The TileDB-SOMA doc-site (Python|R), contains the reference documentation and tutorials.
Reference documentation can also be accessed directly from Python help(tiledsoma)
or R help(package = "tiledbsoma")
.
The capabilities of TileDB-SOMA lay on the different read, access, and query patterns that each of the main implementations of SOMA objects provide:
DenseNDArray
is a dense, N-dimensional array, with offset (zero-based) integer indexing on each dimension.SparseNDArray
is the same as DenseNDArray
but sparse, and supports point indexing (disjoint index access).DataFrame
is a multi-column table with a user-defined columns names and value types, with support for point indexing. Collection
is a persistent container of named SOMA objects.Experiment
is a class that represents a single-cell experiment. It always contains two objects:
obs
: a DataFrame
with primary annotations on the observation axis.ms
: a Collection
of measurements, each composed of X
matrices and axis annotation matrices or data frames (e.g. var
, varm
, obsm
, etc).If you are interested in listing any projects here please contact us at soma@chanzuckerberg.com.
This branch, main
, implements the updated specfication. Please also see the main-old
branch which implements the original specification.
All participants in TileDB spaces are expected to adhere to high standards of professionalism in all interactions. This repository is governed by the specific standards and reporting procedures detailed in depth in the TileDB core repository Code Of Conduct.