scverse / anndata

Annotated data.
http://anndata.readthedocs.io
BSD 3-Clause "New" or "Revised" License
571 stars 152 forks source link

Why is the 10x h5 reader implemented in scanpy and not in anndata? #195

Open LustigePerson opened 5 years ago

LustigePerson commented 5 years ago

I was just wondering if there is a specific reason why the 10x h5 reader function is not implemented in anndata. It would be great if this format could be loaded without the need to load the whole scanpy package first. Most other readers in scanpy are just loaded from anndata.

ivirshup commented 5 years ago

From an api design standpoint, we try to keep AnnData non-specific to single cell. From a process standpoint, the 10x reader was implemented there and never moved. If the function was to move here, we'd probably want to rewrite it first so we wouldn't be adding the tables dependency to AnnData.

LustigePerson commented 5 years ago

Thank you for your response. I was just wondering because all the other readers are located in AnnData and just loaded to scanpy. But I understand that this is a design decision.

flying-sheep commented 5 years ago

I mean we can discuss this – is there a reason we don’t want the reader in here?

LustigePerson commented 5 years ago

For me it would make sense, as I might want to read data into the anndata format without the need to load the whole scanpy package. But as I understood from @ivirshup this was a design descision.

flying-sheep commented 5 years ago

I doubt that it was. An argument can be made that 10x is more single-cell-transcriptomics specific than anndata itself, but I’m not aware of e.g. loom being used in a different way, so …

falexwolf commented 5 years ago

Hey! Yes, it was a design decision: the idea was that anndata is not limited to biological omics data just as loom. scanpy, by contrast, is.

These days, I'm not opposed to making it available from anndata, though. Even if we have 20 or 30 readers, I wouldn't say we have a cluttered API.

flying-sheep commented 5 years ago

I’d say that the only reason for a read function to be scanpy-specific is if it would create scanpy-specific conventions in the AnnData object (such as obsm['X_pca'] or so), but they don’t.

ivirshup commented 5 years ago

I think it would be reasonable to be doing more with 10x files (where CITE-seq gets placed). I'd also want to see if we're going to be doing stuff with the visium data, and what those files look like.

One other issue is that the current 10x readers use tables not h5py and I'd prefer not to add tables as a dependency here. We could rewrite them, but I don't think this is a super high priority – especially for the legacy readers.

adamgayoso commented 4 years ago

I just opened a similar issue at Scanpy. It would be really great to have all the readers in one place -- even if it's in a standalone scio package, which would have functions other methods developers could export into their own packages.

grst commented 1 year ago

See also https://scverse.zulipchat.com/#narrow/stream/315789-data-structures/topic/scverse.20io.20package

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

flying-sheep commented 1 year ago

Let’s track this in https://github.com/scverse/scverse-io/issues/5

ivirshup commented 1 year ago

I'd rather keep this on track as there's an open PR which fixes this (@gtca, please take a look), and the referenced issue doesn't really track a decision on where this function goes.