Open kxxxjo opened 2 years ago
Hi KJ,
Thanks for reaching out!
I'm not sure what's the issue you have with saving the Seuratobject into h5ad, however, all you need is the XY coordinates of your cells in the spatial image. If you have this information saved in some way, (e.g. csv file), you can load it in python and add it to your anndata object (adata.obsm).
This should generate a dataframe with index = cell_IDs
XY_location = pandas.read_csv('yourfile.csv', header=0, index_col=0, sep=',')
Add it to your anndata object loaded from your h5ad file
adata.obsm['xy_loc'] = XY_location
For your second question, if you only have spatial data, you can apply the spatial RNA velocity (second part of SIRV, excluding the integration step) if your spatial protocol is sequencing based (like 10x Visium) from which you can get the un/spliced expression from the sequencing files. We did something similar in this preprint (https://www.biorxiv.org/content/10.1101/2022.03.17.484699v1).
I hope this answers your questions and please let me know if you have further questions.
Bests, Tamim
Thanks for kindly answer,
I have results from spaceranger like above image.
I understood "the XY coordinates of your cells in the spatial image" was saved in my tissue_positions_list.csv, right?
so, I tried to add it to my adata using your code but the following error occurred.
ValueError: Lengths must match to compare
My XY location has 4991 rows and 5 columns, and there were 337 (n_obs) x 31908 (n_vars) in my adata.
I calculated unspliced and spliced ratio using alevin-fry for only interest of clusters (eg. basal and tumor)
I think the error occurred because of calculation only for specific clusters, what do you think about it?
Thanks!
Best, KJ
@tabdelaal In your example dataset, the "xy_loc" has coordinates. ([106.5, 121.5], [107.5, 121.5] ...)
But, my coordinate.csv has information like below image.
I want to match a format like your coordinate, but I don't know how to match it.
If your datasets also were generated by spaceranger (10X), could you let me know how to make a coordinate.csv like your format?
Thanks again,
Best, KJ
I did this for a 10x Visium data in the following way
Spatial_data = pandas.read_csv('tissue_positions_list.csv', header=None, index_col=0)
Based on the image you sent from the file, your file had a header so maybe change header=0
Check if your cell names in the anndata object have this '-1' in the end or not
adata.obs_names
If they lack this '-1', adjust the index of your spatial data to be similar
Spatial_data .index = numpy.array([a.split('-')[0] for a in Spatial_data .index])
Subset the selection of the XY coordinates for only the (337) cells you have in the anndata
XY_location = Spatial_data.loc[adata.obs_names,:]
Select column 5 and 4, which maps to the X and Y coordinates of the spots
XY_location = XY_location.iloc[:,[5,4]]
Add it to your anndata
adata.obsm['xy_loc'] = XY_location
I hope this works
Bests, Tamim
Thanks for trying to solving my trouble, @tabdelaal
I did with below command,
adata_d3_subset = scv.utils.merge(adata, ldata)
Spatial_data = pd.read_csv('spatial/tissue_positions_list.csv', index_col=0, header=None)
# If they lack this '-1', adjust the index of your spatial data to be similar
Spatial_data.index = np.array([a.split('-')[0] for a in Spatial_data.index])
# Subset the selection of the XY coordinates for only the (337) cells you have in the anndata
XY_location = Spatial_data.loc[adata.obs_names,:]
# Select column 5 and 4, which maps to the X and Y coordinates of the spots
XY_location = XY_location.iloc[:,[5,4]]
But, following error occurred.
IndexError: positional indexers are out-of-bounds
To select column 5 and 4, I modified the command,
# Select column 5 and 4, which maps to the X and Y coordinates of the spots
XY_locaion = XY_location.iloc[:,[4,3]]
adata_d3_subset.obsm['xy_loc'] = XY_location
and then I did next step, normalization + scaling + PCA .....
# Normalize the imputed un/spliced expressions, this will also re-normalize the
# full spatial mRNA 'X', this needs to be undone
scv.pp.normalize_per_cell(adata_d3_subset, enforce=True)
# Undo the double normalization of the full mRNA 'X'
adata_d3_subset.X = adata_d3_subset.to_df()[adata_d3_subset.var_names]
# Zero mean and unit variance scaling, PCA, building neibourhood graph, running
# umap and cluster the HybISS spatial data using Leiden clustering
sc.pp.scale(adata_d3_subset)
sc.tl.pca(adata_d3a_subset)
sc.pl.pca_variance_ratio(adata_d3_subset, n_pcs=50, log=True)
sc.pp.neighbors(adata_d3_subset, n_neighbors=30, n_pcs=30)
sc.tl.umap(adata_d3_subset)
sc.tl.leiden(adata_d3_subset)
# Supplementary Fig. S4A
sc.pl.umap(adata_d3_subset, color='leiden')
When I generate the plot using sc.pl.scatter()
, the error occurred,
# Supplementary Fig. S4B
sc.pl.scatter(adata_d3_subset, basis='xy_loc',color='leiden')
KeyError: 'compute coordinates using visualization tool xy_loc first'
What command should I use to compute coordinates using the visualization tool xy_loc?
Thanks!
Best, KJ
Hi @tabdelaal
Could you give me a feedback for troubleshooting?
I'm sorry to rush you :(
Thanks!
Best, KJ
Can you just print your adata_d3_subset variable and show me what you get?
Thanks for reply, @tabdelaal
sure, here is my adata_d3_subset.
AnnData object with n_obs × n_vars = 377 × 31908
obs: 'orig.ident', 'nCount_Spatial', 'nFeature_Spatial', 'nCount_SCT', 'nFeature_SCT', 'Barcode', 'Pathologic.Annotation', 'barcode', 'UMAP_1', 'UMAP_2', 'initial_size_spliced', 'initial_size_unspliced', 'initial_size', 'n_counts'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'gene_count_corr'
obsm: 'X_pca', 'X_umap', 'xy_loc'
layers: 'matrix', 'ambiguous', 'spliced', 'unspliced'
Thanks!
Best, KJ
I'm suspecting one of two technical issues:
1) when all the cell locations are integers, in some cases these visualization functions break. Maybe wise to add an offset (e.g. 0.5) when saving the cell locations in the anndata object
adata_d3_subset.obsm['xy_loc'] = XY_location + 0.5
2) scanpy add an 'X_' before the names of different obsm variables, like 'X_pca' and 'Xumap'. When you pass the basis variable in the scatter plot function you don't add the 'X', so you can say (basis = 'umap'). Maybe you can try this
adata_d3_subset.obsm['X_xy_loc'] = XY_location + 0.5
sc.pl.scatter(adata_d3_subset, basis='xy_loc',color='leiden')
I tried your suggestions but the error still occurred.
my code is below.
adata_d3_subset.obsm['xy_loc'] = XY_location.astype(float) + 0.5
sc.pl.scatter(adata_d3_subset, basis='xy_loc',color='leiden')
But, same error still occurred.
KeyError: 'compute coordinates using visualization tool xy_loc first'
My xy_loc is not ndarray like X_pca or X_umap, so I try to convert xy_loc into ndarray using tp.numpy() function.
But, the format is different with X_pca or X_umap.
please let me know how to solve it.
I appreciate you so much.
Best, KJ
Hi, @tabdelaal
I'm still struggling to solve the problem, but there were no progress.
If you have an idea to solve it, please let me know.
Thanks!
Best, KJ
Hi,
Have you tried adding this captial X before xy_loc??
adata_d3_subset.obsm['X_xy_loc'] = XY_location.astype(float) + 0.5
sure, I have tried everything you suggested but the error still occurred.
As I mentioned above, I think this is because my xy_loc is not ndarray like X_pca or X_umap.
How do you think about that?
Thanks.
Best, KJ
What is the type of your xy_loc then? Can you print out the adata_d3_subset.obsm['X_xy_loc'] as you did with PCA
adata_d3_subset.obsm['X_xy_loc'] and ['xy_loc'] are same. I attached image file including ['pca'] and ['xy_loc'].
In the case of xy_loc, I changed a type using following command
adata_d3_subset.obsm['xy_loc'] = adata_d3_subset.obsm['xy_loc'].to_numpy()
Thanks!
Best, KJ
Hi all,
First of all, thanks for developing useful tool for bioinformatician!
I want to embed the velocity into my spatial image using your tool, SIRV.
But, there is a problem when saving Seuratobject into h5ad. I think the spatial image in Seuratobject loss during conversion to h5ad.
What should I do to save h5ad including spatial image?
And, I have one more question. Could I also use SIRV when I have only spatial data, not single cell?
Thanks!
Best, KJ