Add template+standards for spatial transcriptomics data

allaway commented 2 years ago

We have at least one dataset coming in that uses ST. Should assess what metadata to collect and what file formats to require. There's both imaging and transcriptomics components, so this might require multiple templates.

HTAN likely has a draft spec, but not sure.

anngvu commented 2 years ago

Example of shared public Visium data:

Items with * show files outputted directly from the default Space Ranger pipeline, described here. It seems the most convenient output to share looks like the Mendeley repo and contains files/folders as-is from the pipeline. This isn't the "rawest" data, but looks like appropriate assets for reuse. The last example contains only count matrices.

If we also want the "rawest" data, this would be what 10x calls input files on their datasets page (TIFFs and FASTQs).

Should we prioritize (TIFFs and FASTQs) or (post-Space Ranger data) files?

allaway commented 2 years ago

IMO, I think we should prioritize the raw data, because this will allow people to most easily use different genome builds if they want to compare it to other ST data, or to ensure that all of the data in their particular are pre-processed with the same tools and processed using the same version of spaceranger.

However, we should note that the spaceranger output might be required by the journal they go to publish in, so perhaps we can recommend this as an optional add-on?

nf-osi / nf-metadata-dictionary

Add template+standards for spatial transcriptomics data #160