[Image data loss] Conversion of Seuratobject to h5ad

kxxxjo commented 2 years ago

Hi all,

First of all, thanks for developing useful tool for bioinformatician!

I want to embed the velocity into my spatial image using your tool, SIRV.

But, there is a problem when saving Seuratobject into h5ad. I think the spatial image in Seuratobject loss during conversion to h5ad.

What should I do to save h5ad including spatial image?

And, I have one more question. Could I also use SIRV when I have only spatial data, not single cell?

Thanks!

Best, KJ

tabdelaal commented 2 years ago

Hi KJ,

Thanks for reaching out!

I'm not sure what's the issue you have with saving the Seuratobject into h5ad, however, all you need is the XY coordinates of your cells in the spatial image. If you have this information saved in some way, (e.g. csv file), you can load it in python and add it to your anndata object (adata.obsm).

This should generate a dataframe with index = cell_IDs XY_location = pandas.read_csv('yourfile.csv', header=0, index_col=0, sep=',')

Add it to your anndata object loaded from your h5ad file adata.obsm['xy_loc'] = XY_location

For your second question, if you only have spatial data, you can apply the spatial RNA velocity (second part of SIRV, excluding the integration step) if your spatial protocol is sequencing based (like 10x Visium) from which you can get the un/spliced expression from the sequencing files. We did something similar in this preprint (https://www.biorxiv.org/content/10.1101/2022.03.17.484699v1).

I hope this answers your questions and please let me know if you have further questions.

Bests, Tamim

kxxxjo commented 2 years ago

Thanks for kindly answer,

I have results from spaceranger like above image.

I understood "the XY coordinates of your cells in the spatial image" was saved in my tissue_positions_list.csv, right?

so, I tried to add it to my adata using your code but the following error occurred.

ValueError: Lengths must match to compare

My XY location has 4991 rows and 5 columns, and there were 337 (n_obs) x 31908 (n_vars) in my adata.

I calculated unspliced and spliced ratio using alevin-fry for only interest of clusters (eg. basal and tumor)

I think the error occurred because of calculation only for specific clusters, what do you think about it?

Thanks!

Best, KJ

kxxxjo commented 2 years ago

@tabdelaal In your example dataset, the "xy_loc" has coordinates. ([106.5, 121.5], [107.5, 121.5] ...)

But, my coordinate.csv has information like below image.

I want to match a format like your coordinate, but I don't know how to match it.

If your datasets also were generated by spaceranger (10X), could you let me know how to make a coordinate.csv like your format?

Thanks again,

Best, KJ

tabdelaal commented 2 years ago

I did this for a 10x Visium data in the following way

Spatial_data = pandas.read_csv('tissue_positions_list.csv', header=None, index_col=0)

Based on the image you sent from the file, your file had a header so maybe change header=0

Check if your cell names in the anndata object have this '-1' in the end or not adata.obs_names

If they lack this '-1', adjust the index of your spatial data to be similar Spatial_data .index = numpy.array([a.split('-')[0] for a in Spatial_data .index])

Subset the selection of the XY coordinates for only the (337) cells you have in the anndata XY_location = Spatial_data.loc[adata.obs_names,:]

Select column 5 and 4, which maps to the X and Y coordinates of the spots XY_location = XY_location.iloc[:,[5,4]]

Add it to your anndata adata.obsm['xy_loc'] = XY_location

I hope this works

Bests, Tamim

kxxxjo commented 2 years ago

Thanks for trying to solving my trouble, @tabdelaal

I did with below command,

adata_d3_subset = scv.utils.merge(adata, ldata) 
Spatial_data = pd.read_csv('spatial/tissue_positions_list.csv', index_col=0, header=None)

# If they lack this '-1', adjust the index of your spatial data to be similar
Spatial_data.index = np.array([a.split('-')[0] for a in Spatial_data.index])

# Subset the selection of the XY coordinates for only the (337) cells you have in the anndata
XY_location = Spatial_data.loc[adata.obs_names,:]

# Select column 5 and 4, which maps to the X and Y coordinates of the spots
XY_location = XY_location.iloc[:,[5,4]]

But, following error occurred.

IndexError: positional indexers are out-of-bounds

To select column 5 and 4, I modified the command,

# Select column 5 and 4, which maps to the X and Y coordinates of the spots
XY_locaion = XY_location.iloc[:,[4,3]]
adata_d3_subset.obsm['xy_loc'] = XY_location

and then I did next step, normalization + scaling + PCA .....

# Normalize the imputed un/spliced expressions, this will also re-normalize the
# full spatial mRNA 'X', this needs to be undone 
scv.pp.normalize_per_cell(adata_d3_subset, enforce=True)

# Undo the double normalization of the full mRNA 'X'
adata_d3_subset.X = adata_d3_subset.to_df()[adata_d3_subset.var_names]

# Zero mean and unit variance scaling, PCA, building neibourhood graph, running
# umap and cluster the HybISS spatial data using Leiden clustering
sc.pp.scale(adata_d3_subset)
sc.tl.pca(adata_d3a_subset)
sc.pl.pca_variance_ratio(adata_d3_subset, n_pcs=50, log=True)
sc.pp.neighbors(adata_d3_subset, n_neighbors=30, n_pcs=30)
sc.tl.umap(adata_d3_subset)
sc.tl.leiden(adata_d3_subset)
# Supplementary Fig. S4A
sc.pl.umap(adata_d3_subset, color='leiden')

When I generate the plot using sc.pl.scatter(), the error occurred,

# Supplementary Fig. S4B
sc.pl.scatter(adata_d3_subset, basis='xy_loc',color='leiden')

KeyError: 'compute coordinates using visualization tool xy_loc first'

What command should I use to compute coordinates using the visualization tool xy_loc?

Thanks!

Best, KJ

kxxxjo commented 2 years ago

Hi @tabdelaal

Could you give me a feedback for troubleshooting?

I'm sorry to rush you :(

Thanks!

Best, KJ

tabdelaal commented 2 years ago

Can you just print your adata_d3_subset variable and show me what you get?

kxxxjo commented 2 years ago

Thanks for reply, @tabdelaal

sure, here is my adata_d3_subset.

AnnData object with n_obs × n_vars = 377 × 31908
    obs: 'orig.ident', 'nCount_Spatial', 'nFeature_Spatial', 'nCount_SCT', 'nFeature_SCT', 'Barcode', 'Pathologic.Annotation', 'barcode', 'UMAP_1', 'UMAP_2', 'initial_size_spliced', 'initial_size_unspliced', 'initial_size', 'n_counts'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'gene_count_corr'
    obsm: 'X_pca', 'X_umap', 'xy_loc'
    layers: 'matrix', 'ambiguous', 'spliced', 'unspliced'

Thanks!

Best, KJ

tabdelaal commented 2 years ago

I'm suspecting one of two technical issues:

1) when all the cell locations are integers, in some cases these visualization functions break. Maybe wise to add an offset (e.g. 0.5) when saving the cell locations in the anndata object adata_d3_subset.obsm['xy_loc'] = XY_location + 0.5

2) scanpy add an 'X_' before the names of different obsm variables, like 'X_pca' and 'Xumap'. When you pass the basis variable in the scatter plot function you don't add the 'X', so you can say (basis = 'umap'). Maybe you can try this adata_d3_subset.obsm['X_xy_loc'] = XY_location + 0.5 sc.pl.scatter(adata_d3_subset, basis='xy_loc',color='leiden')

kxxxjo commented 2 years ago

I tried your suggestions but the error still occurred.

my code is below.

adata_d3_subset.obsm['xy_loc'] = XY_location.astype(float) + 0.5
sc.pl.scatter(adata_d3_subset, basis='xy_loc',color='leiden')

But, same error still occurred.

KeyError: 'compute coordinates using visualization tool xy_loc first'

My xy_loc is not ndarray like X_pca or X_umap, so I try to convert xy_loc into ndarray using tp.numpy() function.

But, the format is different with X_pca or X_umap.

please let me know how to solve it.

I appreciate you so much.

Best, KJ

kxxxjo commented 2 years ago

Hi, @tabdelaal

I'm still struggling to solve the problem, but there were no progress.

If you have an idea to solve it, please let me know.

Thanks!

Best, KJ

tabdelaal commented 2 years ago

Hi,

Have you tried adding this captial X before xy_loc??

adata_d3_subset.obsm['X_xy_loc'] = XY_location.astype(float) + 0.5

kxxxjo commented 2 years ago

sure, I have tried everything you suggested but the error still occurred.

As I mentioned above, I think this is because my xy_loc is not ndarray like X_pca or X_umap.

How do you think about that?

Thanks.

Best, KJ

tabdelaal commented 2 years ago

What is the type of your xy_loc then? Can you print out the adata_d3_subset.obsm['X_xy_loc'] as you did with PCA

kxxxjo commented 2 years ago

adata_d3_subset.obsm['X_xy_loc'] and ['xy_loc'] are same. I attached image file including ['pca'] and ['xy_loc'].

In the case of xy_loc, I changed a type using following command

adata_d3_subset.obsm['xy_loc'] = adata_d3_subset.obsm['xy_loc'].to_numpy()

Thanks!

Best, KJ

tabdelaal / SIRV

[Image data loss] Conversion of Seuratobject to h5ad #1