ivirshup / sc-interchange

Better interchange for single cell tools
8 stars 0 forks source link

What are the goals here? #3

Open ivirshup opened 5 years ago

ivirshup commented 5 years ago

I think it would be good to scope out the requirements of an interchange file format. This could probably start with some ideas of what the use cases are (basic user stories).

Some questions I have about what is reasonably achievable:

A little expansion on "conventions v. generality":

In an AnnData object we don't have nested data frames, so I would imagine any nested dataframes could just be used as elements of obsm. This is probably also where we'd put reducedDims. How do we keep this information around? We could just know what kinds of names are reduced dimensions, or we'd have to "tag" the arrays.

@flying-sheep, from your working with in-memory exchange, do you have any thoughts on this?

flying-sheep commented 5 years ago

SingleCellExperiment is more specific (e.g. reducedDims exists while we have the more generic obsm), so concepts that are conventions in AnnData aren’t in SCE.

Is there anything point here you’d like to hear my opinion on specifically? :smiley:

ivirshup commented 5 years ago

I was wondering if you had thoughts on dealing with round-trip conversions when there wasn't clear one to one mappings. For example, going R->python->R with a SingleCellExperiment with nested dataframes. It's not obvious to me (from here) how you could deal with that. If you flatten, how do know what to unflatten? If you move them to obsm, how do you know what to move back to colData? Another example would be the SingleCellExperiment LinearEmbeddingMatrix, where the variable loadings never get subset, so it doesn't quite map to varm.

flying-sheep commented 5 years ago

I don’t handle anything tricky yet :sweat_smile: Almost everything I do is round-trippable (except for the name conversion which changes capitalization and would canonicalize the obsm/reducedDims name of diffusion maps – ad.obsm['X_dm']reducedDim(sce, 'DM')ad.obsm['X_diffmap'])

What do you mean with flattening? Are there nested data.frames in SCE? What for?