single-cell-data / SOMA

A flexible and extensible API for annotated 2D matrix data stored in multiple underlying formats.
MIT License
69 stars 9 forks source link

Create R API examples #5

Open ambrosejcarr opened 2 years ago

ambrosejcarr commented 2 years ago

Draft R API examples and request review.

vjcitn commented 2 years ago

Hi @ambrosejcarr ... Aaron Lun pointed this project out to me and I wanted to let you know that we at Bioconductor are interested and willing to devote effort to achieving interoperability with SingleCellExperiment. Since we've invested quite a bit of effort in HDF5 as a back end (both for on-disk and for the HSDS API) we are interested in comparative data on effectiveness of TileDB as a back end in this domain. If you have example data for benchmarking the new API let us know.

ambrosejcarr commented 2 years ago

@vjcitn that's great to hear! We're thrilled to hear that you're interested in working on this. Our next steps are to add R-based API examples to the draft, solicit feedback, and then create an initial implementation for benchmarking and further iteration.

We'd love to have your critique of the API draft and your help in benchmarking. Would you be interested in getting on a call to hear a bit more about the project goals? If yes please contact me at acarr@chanzuckerberg.com and we can set up a time.

vjcitn commented 2 years ago

Hi @ambrosejcarr -- just checking in here -- if we can help at the R end in any way, let us know.

ambrosejcarr commented 2 years ago

Thanks for checking in, and sorry for being a bit quiet. We've been focused organizing work towards demonstrating a proof of concept for this API specification.

Our initial focus is on establishing an effective feedback loop as quickly as possible. We think the way to do that is to enable an in-memory round trip of data from Seurat, Bioconductor, and Scanpy, through the tiledb implementation of the matrix API, and back to a Seurat, Bioconductor, and Scanpy object. We're planning to use the 10x Multiome data as our first test case.

Our approach will be to use TileDB array primitives which will be composed using a C++ API and R and python shims, and aim to have this done by the middle of February.

After we have this work completed, we see at least three ways we want to collaborate:

  1. Would you be interested in implementing the connector(s) between bioconductor and the R API?
  2. Can you give us feedback on the implementation? Are there use cases that the implementation doesn't enable?
  3. We expect to find incompatibilities between the matrix API model and the three tool chains, or between the three toolchains. We plan to post these as issues on the repository as we discover them and would greatly appreciate it if you're interested in participating in discussions to resolve (and ideally align) on these questions.

We also expect to post a draft development roadmap next week, and will welcome your feedback on the sequencing of work. We appreciated the feedback on not leaving multiomics for later during our earlier call and have incorporated that into the plan.

If this sounds good, we'll get back in touch with the roadmap, and from there will probably want to get some meetings on the calendar for an initial demo, and some synchronization calls to follow.

vjcitn commented 2 years ago

Sounds good to me. We are definitely able to collaborate in items 1-3. I am keeping the Bioconductor Technical Advisory Board abreast of this issue stream, so we should have input from multiple highly engaged voices.

pablo-gar commented 1 year ago

The R API of tiledsoma is still under development. We anticipate it will be completed in ~1 month. Once it is finished this conversation can be resumed.