scverse / scirpy

A scanpy extension to analyse single-cell TCR and BCR data.
https://scirpy.scverse.org/en/latest/
BSD 3-Clause "New" or "Revised" License
217 stars 34 forks source link

Split IO into separate package #385

Open grst opened 1 year ago

grst commented 1 year ago

In the scverse core team the consensus was reached that IO should not be part of the analysis packages (e.g. scanpy, scirpy, muon), but rather in an independent package with minimal dependencies and have the analysis packages depend on it. The hope is that this leads to a wider adoption of scverse datastructures, since the "dependency cost" of depending on a lightweight IO packages is lower than depending on an entire framework. This issue is to track the goal of creating such a package for scirpy.

Name (?)

A couple of ideas

Scope

Maybe

The latter two go beyond just storing AIRR data as an awkward array, but implement the scirpy receptor model. But they are likely useful for some other packages. But then again if a method needs this, they could just depend on the full scirpy.

In case of doubt, err on the side of including less in the package, as it could be added later if required.

grst commented 1 year ago

As discussed with @zktuong, it would be nice to refer to the dandelion preprocessing workflow (which addresses some issues with the cellranger output) from this package and/or scirpy. In the end, this shouldn't be hard, as the dandelion pipeline reads cellranger output and writes AIRR, which can directoy be consumed by the read_airr function.

zktuong commented 1 year ago

tagging @DennisCambridge