scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.9k stars 597 forks source link

Auto-generation of Study files for Single Cell Portal submission #950

Open sjfleming opened 4 years ago

sjfleming commented 4 years ago

I have recently been using scanpy to analyze my data and get it into a format that I want to share with others. I've been doing that using the Broad's Single Cell Portal, which I suspect others use as well. It took me a couple of hours to figure out the details of file formatting, etc, but I realize the process could be easily automated. Given an adata object, a list of files can be generated that are formatted correctly for upload. I have written this function and I use it myself.

Is scanpy interested in a function that does this? Where in the code would such a thing go?

I'd be happy to create a pull request, but didn't want to go ahead unless there's interest.

Or maybe anndata is a better place for something like this?

ivirshup commented 4 years ago

I think this could be useful, but I'm not sure how stable their format is, or if we can promise to support it. Maybe this could be a standalone package which we link to from the docs?

Thoughts @flying-sheep?

fidelram commented 4 years ago

This sounds great. If I recall correctly it is possible to upload data to the broad single cell portal and share this link with other people right? This would certainly be useful.

On Tue, Dec 10, 2019 at 4:23 AM Isaac Virshup notifications@github.com wrote:

I think this could be useful, but I'm not sure how stable their format is, or if we can promise to support it. Maybe this could be a standalone package which we link to from the docs?

Thoughts @flying-sheep https://github.com/flying-sheep?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/950?email_source=notifications&email_token=ABF37VJIYCEDQNKTLEB3V23QX4DS7A5CNFSM4JXFVOAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGMRPJY#issuecomment-563681191, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF37VKHFTRDL6MDRBWAJ33QX4DS7ANCNFSM4JXFVOAA .

--

Fidel Ramirez

gokceneraslan commented 4 years ago

(Orthogonal to this suggestion: We can maybe talk to the SCP people and ask how difficult would it be to use h5ad as a native format.)

gokceneraslan commented 2 years ago

@sjfleming Is there a GIST or repo url to use this code? Might take time to integrate into scanpy/anndata but people can benefit from the code if it already lives somewhere...

ivirshup commented 2 years ago

@gokceneraslan, also kinda orthogonal, but there is some effort going into standardizing metadata and schema of AnnData-like objects for data repositories over at the single-cell-data org. A lot of this is being pushed by the cellxgene team.

I'd assume SCP can handle h5ads by this point, but it would be nice to be able to enforce schemas that repositories require.