scverse / squidpy

Spatial Single Cell Analysis in Python
https://squidpy.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
401 stars 72 forks source link

NaN value when importing Visium dataset #797

Open Lem-P opened 5 months ago

Lem-P commented 5 months ago

Hi, I am trying to import a dataset from 10X Visium H&E

I am importing the dataset with:

adata = sq.read.visium('path')
adata.var_names_make_unique()

Then pre-processing:

sc.pp.filter_cells(adata, min_counts = 1000)
sc.pp.filter_genes(adata, min_cells=5)
sc.pp.normalize_total(adata, inplace = True)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor='seurat', n_top_genes=4000, inplace=True)
sc.pp.pca(adata, n_comps=50, use_highly_variable=True, svd_solver='arpack')
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.louvain(adata, key_added='clusters')

Then to calculate the image_features, I create my ImageContainer (not clear to me if I can do it before filtering or not).

library_id = "Mouse_2"
img = sq.im.ImageContainer(
    adata.uns["spatial"][library_id]["images"]["hires"],
    scale=adata.uns["spatial"][library_id]["scalefactors"]["tissue_hires_scalef"],
)

No problem until that point (except a lot of warnings about deprecated parameter in pandas)

I then do:

for scale in [1.0, 2.0]:
    feature_name = f"features_summary_scale{scale}"
    sq.im.calculate_image_features(
        adata,
        img.compute(),
        features="summary",
        key_added=feature_name,
        n_jobs=4,
        scale=scale,
    )

but get this error:

Traceback

```pytb in ImageContainer.generate_spot_crops(self, adata, spatial_key, library_id, spot_diameter_key, spot_scale, obs_names, as_array, squeeze, return_obs, **kwargs) 820 radius = int(round(diameter // 2 * spot_scale)) 822 # get coords in image pixel space from original space --> 823 y = int(spatial[i][1] * scale) 824 x = int(spatial[i][0] * scale) 826 # if CropCoords exist, need to offset y and x ValueError: cannot convert float NaN to integer ```

If trying to do neighborhood enrichment with sq.gr.spatial_neighbors(adata)

I got this ValueError: Input X contains NaN.

Version

squidpy==1.4.1

giovp commented 5 months ago

can you check whether you have nan in adata.obsm["spatial"]

Lem-P commented 5 months ago

Maybe a noob question, but how? I have tried with

df = pd.DataFrame(adata.obsm["spatial"])
nan_count = df.isna().sum()

print(nan_count)

and got 0 1 1 1 dtype: int64

But not sure it is the right method

giovp commented 5 months ago

if np.isnan(adata.obsm["spatial"]).sum() return > 1 then you have nan and it's something in your data and possibly not related to squidpy

Lem-P commented 5 months ago

np.isnan(adata.obsm["spatial"]).sum() gives me 2 as output. How can I found out where it's coming from? (the data comes from spaceranger-2.0.1) How can I correct the dataset?

giovp commented 5 months ago

unfortunately I don't know, an option is also to just filter out cells that are like that, and also check in original raw data where that issue might arise.

Lem-P commented 5 months ago

After some testing, it is the sq.read.visium() function that create the issue. If I create my AnnData with sc.read_visium() function, there are no NaN value in adata.obsm["spatial"] and I can go on with the rest of the analysis. So there is indeed a bug with Squidpy, the workaround is to use Scanpy to import the Visium data

michalk8 commented 5 months ago

This can be because of this line: https://github.com/scverse/squidpy/blob/main/src/squidpy/read/_read.py#L94 @Lem-P do both of the same adata objects (from sq.read.visium() and sc.read_visium() have the same number of cells? The SquidPy function will keep all the cells in the adata and put NaNs for the coords if they are missing.

Lem-P commented 5 months ago

No, I have the same number of observations/cells and variables/genes in both objects. But I found the problematic row. In the object made with Scanpy : array([ 7335, 12140]) In the object made with Squidpy : array([nan, nan]) Would the space before the first value create the issue? Where is it coming from? Why would spaceranger suddenly add a space before a value?

giovp commented 4 months ago

ok, it seems the issue is then due to the visium reader, also reported in #746 , it has to do with space ranger versions I'm afraid. I won't have time to look at it soon but @Lem-P I would take a look at @scverse/spatialdata-io for a visium reader that should support all spaceranger versions

wangjiawen2013 commented 1 month ago

I also met this error, even if I use the latest squidpy (1.5.0) and I finally find the reason. header=1 should be changed to header=0. Perhaps It's a typo of squidpy. https://github.com/scverse/squidpy/issues/746

giovp commented 2 days ago

yes this I think it's due to the different specifications. Unfortuantely we only maintain readers in spatialdata-io so I would suggest to take a look at that. https://spatialdata.scverse.org/projects/io/en/latest/index.html