While I in the end was able to concatenate the data the way I like, the user experience wasn't as great as I had hoped, so wanted to drop some feedback. As I'm not that familiar with spatialdata yet, it might be that there are already better solutions -- please let me know if there are.
Starting situation
I have ~20 Visium Cytassist samples from a clinical trial processed with nf-core/spatialtranscriptomics (using the https://github.com/nf-core/spatialtranscriptomics/pull/67 branch that already uses spatialdata). The pipeline generates a single .zarr folder for each sample.
Desired outcome
I would like to have all samples in a single SpatialData object. The AnnData table should contain the gene expression from all samples.
Pain points
sd.concatenate enforces that the input is a list. Is there a reason this can't accept any Sequence type (e.g. dict_values)?
Usually, I pass a dictionary sample_id -> AnnData to anndata.concat, which nicely makes unique obs_names in combination with concat(..., index_unique="_"). This doesn't work with spatialdata.concatenate, which leaves me with either manipulating the obs_names for each object before concatenation, or ugly obs names with numeric sufficies (e.g. AACTCAACCTTGACCA-1_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0). IMO it would be great to support a dict as input to spatialdata.concatenate, too.
The per-sample SpatialData objects all have the same names for images, shapes and coordinate systems. I currently rename them like this:
which seems a bit cumbersome. I'm wondering if there's a better solution or what's the intended way of handling such cases. It could also be worth adding a process to the nf-core/spatialtranscriptomics pipeline that already does the concatenation step.
While I in the end was able to concatenate the data the way I like, the user experience wasn't as great as I had hoped, so wanted to drop some feedback. As I'm not that familiar with spatialdata yet, it might be that there are already better solutions -- please let me know if there are.
Starting situation
I have ~20 Visium Cytassist samples from a clinical trial processed with nf-core/spatialtranscriptomics (using the https://github.com/nf-core/spatialtranscriptomics/pull/67 branch that already uses spatialdata). The pipeline generates a single
.zarr
folder for each sample.Desired outcome
I would like to have all samples in a single SpatialData object. The AnnData table should contain the gene expression from all samples.
Pain points
sd.concatenate
enforces that the input is a list. Is there a reason this can't accept anySequence
type (e.g.dict_values
)?sample_id -> AnnData
toanndata.concat
, which nicely makes unique obs_names in combination withconcat(..., index_unique="_")
. This doesn't work with spatialdata.concatenate, which leaves me with either manipulating theobs_names
for each object before concatenation, or ugly obs names with numeric sufficies (e.g.AACTCAACCTTGACCA-1_0_0_0_0_0_0_0_0_0_0_0_0_0_0_0
). IMO it would be great to support a dict as input to spatialdata.concatenate, too.The per-sample SpatialData objects all have the same names for images, shapes and coordinate systems. I currently rename them like this:
which seems a bit cumbersome. I'm wondering if there's a better solution or what's the intended way of handling such cases. It could also be worth adding a process to the nf-core/spatialtranscriptomics pipeline that already does the concatenation step.