SOMA – for “Stack Of Matrices, Annotated” – is a flexible, extensible, and open-source API enabling access to data in a variety of formats. SOMA is designed to be general-purpose for data that can be modeled as one or more sets of 2D annotated matrices with measurements of features across observations. The driving use case of SOMA is for single-cell data in the form of annotated matrices where observations are frequently cells and features are genes, proteins, or genomic regions.
Datasets generated by profiling single cells are rapidly increasing in size and complexity. This has resulted in a need for scalable solutions to accommodate data sizes that no longer fit in memory and flexibility to accommodate the diversity of data being produced.
To address these emerging needs in the single cell ecosystem, the Chan Zuckerberg Initiative in partnership with TileDB is:
The SOMA
specification and its TileDB-SOMA
implementation provide the following capabilities for single-cell data:
.h5ad
/.mtx
/.tgz
/.RData
/.h5Seurat
/ etc.This project adheres to CZI's Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.