scverse / squidpy

Spatial Single Cell Analysis in Python
https://squidpy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
424 stars 77 forks source link

Read 10X multiomic data in Squidpy #565

Closed jnmark closed 1 year ago

jnmark commented 2 years ago

Hello, I am trying to read in multiomic data from the newly released 10X open source datasets: https://www.10xgenomics.com/resources/datasets/multiomic-integration-neuroscience-application-note-visium-for-ffpe-plus-immunofluorescence-alzheimers-disease-mouse-model-brain-coronal-sections-from-one-hemisphere-over-a-time-course-1-standard

This dataset is not included in the sample_ids within sq.datasets.visium. It has Visium IF with matched scRNA-seq and ATAC-seq data. Could you please tell me the best way to read this in? Should I read in the Visium IF data first and then add the scRNA-seq and ATAC-seq data as layers? Could you please give me some pointers on how to organize the different omic layers? I want to be able to use all of them in a combined analysis.

gtca commented 2 years ago

Hey @jnmark, thanks for raising it!

While not an immediate solution to all of your questions, I'd just mention that we'll be working on synergies between squidpy and muon for spatial multi-omics.

For organising omics layers, there is mudata. You can read count data from multimodal assays just fine:

import muon as mu
mdata = mu.read_10x_h5("Multiome_RNA_ATAC_Mouse_Brain_Alzheimers_AppNote_filtered_feature_bc_matrix.h5")
# anndata/_core/anndata.py:1830: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
# Added `interval` annotation for features from Multiome_RNA_ATAC_Mouse_Brain_Alzheimers_AppNote_filtered_feature_bc_matrix.h5
# anndata/_core/anndata.py:1830: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
# mudata/_core/mudata.py:437: UserWarning: var_names are not unique. To make them unique, call `.var_names_make_unique`.

# MuData object with n_obs × n_vars = 33459 × 99200
#   var:  'gene_ids', 'feature_types', 'genome', 'interval'
#   2 modalities
#     rna:        33459 x 32286
#       var:      'gene_ids', 'feature_types', 'genome', 'interval'
#     atac:       33459 x 66914
#       var:      'gene_ids', 'feature_types', 'genome', 'interval'

For the dataset in question, VisiumFFPE_Mouse_Brain_Alzheimers_AppNote_filtered_feature_bc_matrix.h5 only contains gene expression counts. From the dataset description it seems there also should be RNA+ATAC counts from other (!) hemispheres. So I guess that will be a data integration question to link the AnnData with spatial transcriptomics with the MuData with RNA+ATAC. I imagine the solution might involve models like cell2location.

jnmark commented 2 years ago

Hi @gtca , thanks for the pointers! I think the dataset you referred to in your solution is the "RNA+ATAC counts from other (!) hemispheres", looks like they just put it at different places :) I think for now I will analyze them separately - RNA+ATAC with MuData and ST with AnnData.

giovp commented 1 year ago

will close this due to inactivity, but seems resolved