Closed Isabelle-b closed 5 months ago
Just to verify, the proposed component will subset a mudata to contain only the intersecting set of cells across modalities, right?
A concern here is that if a lot of high quality data might be removed from one modality if something went wrong with the other modality. Depending on how much of the data is removed, I think the component should produce an error. For example, if more than 80% of the cells in the RNA modality are removed, the component should error. What would you suggest?
Also, would it be possible to copy paste the code instead of screenshots? This makes it a lot easier for us to work with the code.
Hey Robrecht,
Thanks! Yes indeed it would be nice to keep the info on what amount of cells are removed from each layer and which cells. And potentially add a warning message in case. It could even be all on same object as we commonly look at both the intersection and non-intersection ones.
As a side note, the filtering on FB data is currently super lenient, so we will likely remove lots of cells for which some antibody info is still there but no good quality RNA is detected.
Here are the chunks I used:
import scanpy as sc
import muon as mu
mdata = mu.read_h5mu(path)
mdata = mdata[(mdata.obs.index.isin(mdata["rna"].obs.index))&(mdata.obs.index.isin(mdata["prot"].obs.index))]
mdata.update()
# remove calculated layers and re-run with new subset
del mdata['rna'].var['highly_variable']
del mdata['rna'].uns['pca_variance']
del mdata['rna'].obsm['X_pca']
del mdata['rna'].obsm['X_umap']
del mdata['rna'].varm['pca_loadings']
del mdata['rna'].obsp['distances']
del mdata['rna'].obsp['connectivities']
del mdata['rna'].uns['neighbors']
# remove calculated layers and re-run with new subset
del mdata['prot'].uns['pca_variance']
del mdata['prot'].obsm['X_pca']
del mdata['prot'].obsm['X_umap']
del mdata['prot'].varm['pca_loadings']
del mdata['prot'].obsp['distances']
del mdata['prot'].obsp['connectivities']
del mdata['prot'].uns['neighbors']
mdata['rna'].uns['pca_variance'] = {}
del mdata['rna'].uns['pca_variance']
mdata['prot'].uns['pca_variance'] = {}
del mdata['prot'].uns['pca_variance']
mdata.update()
mdata['rna'].uns['log1p']['base'] = None
tmp_rna = mdata['rna'].X.copy()
mdata['rna'].X = mdata['rna'].layers['log_normalized']
hvg_object = sc.pp.highly_variable_genes(mdata['rna'], flavor = "seurat", batch_key = "sample_id", subset=False, inplace=False) # careful, 'layer' option not working!
mdata['rna'].var['highly_variable'] = hvg_object['highly_variable']
sc.tl.pca(mdata['rna'], n_comps = 25, copy = False, use_highly_variable = True)
mdata['rna'].X = tmp_rna
tmp_prot = mdata['prot'].X.copy()
mdata['prot'].X = mdata['prot'].layers['clr']
sc.tl.pca(mdata['prot'], n_comps= 25, copy=False, use_highly_variable = False)
mdata['prot'].X = tmp_prot
mdata.write(path_v2)
Main library versions: scanpy==1.9.2 anndata==0.9.2 umap==0.5.3 numpy==1.22.4 scipy==1.6.3 pandas==1.5.3 scikit-learn==0.24.2 statsmodels==0.12.2 python-igraph==0.10.6 louvain==0.8.1 pynndescent==0.5.10
I have everything in my project folder. I can point it to you too.
Hey all,
As discussed here is the ticket to create a module allowing intersection between GEX and FB. Here is the code we were using:
Thanks!
Isabelle