yi-zhang / STHD

STHD: probabilistic cell typing of Single spots in whole Transcriptome spatial data with High Definition
GNU General Public License v2.0
3 stars 1 forks source link

ValueError: Input X contains NaN. #3

Closed HenryJinZH closed 2 days ago

HenryJinZH commented 3 weeks ago

Problem Description

I encountered an issue when running the following code:

full_data.adata = qcmask.background_detector(full_data.adata, threshold=50, n_neighs=4, n_rings=2)

An error occurred while processing the test 10x CRC data, but everything worked fine when I tested HDST on other 10x datasets.

The error message is as follows:

ValueError                                Traceback (most recent call last)
Cell In[7], line 7
      5 from STHD import qcmask
      6 from STHD import train
----> 7 full_data.adata = qcmask.background_detector(full_data.adata, threshold = 50, n_neighs = 4, n_rings = 2)
      8 sthdata_filtered = qcmask.filter_background(full_data, threshold = 50)
     10 n_iter = 23

File [~/STHD/STHD/qcmask.py:20](http://localhost:8887/lab/tree/STHD/STHD/qcmask.py#line=19), in background_detector(adata, layer, threshold, n_neighs, n_rings, coord_type)
     18 # print(D_raw[:10])
     19 p_background = D_raw.copy()
---> 20 sq.gr.spatial_neighbors(
     21     adata,
     22     spatial_key="spatial",
     23     coord_type=coord_type,
     24     n_neighs=n_neighs,
     25     n_rings=n_rings,
     26 )
     27 A_csr = adata.obsp["spatial_distances"]  # [X, X]
     28 Acsr_row = A_csr.indptr

File [~/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py:244](http://localhost:8887/lab/tree/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py#line=243), in spatial_neighbors(adata, spatial_key, elements_to_coordinate_systems, table_key, library_key, coord_type, n_neighs, radius, delaunay, n_rings, percentile, transform, set_diag, key_added, copy)
    242     Dst = block_diag([m[1] for m in mats], format="csr")[ixs, :][:, ixs]
    243 else:
--> 244     Adj, Dst = _build_fun(adata)
    246 neighs_key = Key.uns.spatial_neighs(key_added)
    247 conns_key = Key.obsp.spatial_conn(key_added)

File [~/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py:285](http://localhost:8887/lab/tree/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py#line=284), in _spatial_neighbor(adata, spatial_key, coord_type, n_neighs, radius, delaunay, n_rings, transform, set_diag, percentile)
    283 warnings.simplefilter("ignore", SparseEfficiencyWarning)
    284 if coord_type == CoordType.GRID:
--> 285     Adj, Dst = _build_grid(
    286         coords,
    287         n_neighs=n_neighs,
    288         n_rings=n_rings,
    289         delaunay=delaunay,
    290         set_diag=set_diag,
    291     )
    292 elif coord_type == CoordType.GENERIC:
    293     Adj, Dst = _build_connectivity(
    294         coords,
    295         n_neighs=n_neighs,
   (...)
    299         set_diag=set_diag,
    300     )

File [~/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py:342](http://localhost:8887/lab/tree/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py#line=341), in _build_grid(coords, n_neighs, n_rings, delaunay, set_diag)
    334 def _build_grid(
    335     coords: NDArrayA,
    336     n_neighs: int,
   (...)
    339     set_diag: bool = False,
    340 ) -> tuple[csr_matrix, csr_matrix]:
    341     if n_rings > 1:
--> 342         Adj: csr_matrix = _build_connectivity(
    343             coords,
    344             n_neighs=n_neighs,
    345             neigh_correct=True,
    346             set_diag=True,
    347             delaunay=delaunay,
    348             return_distance=False,
    349         )
    350         Res, Walk = Adj, Adj
    351         for i in range(n_rings - 1):

File [~/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py:405](http://localhost:8887/lab/tree/miniconda3/envs/scanpy/lib/python3.12/site-packages/squidpy/gr/_build.py#line=404), in _build_connectivity(coords, n_neighs, radius, delaunay, neigh_correct, set_diag, return_distance)
    403 r = 1 if radius is None else radius if isinstance(radius, (int, float)) else max(radius)
    404 tree = NearestNeighbors(n_neighbors=n_neighs, radius=r, metric="euclidean")
--> 405 tree.fit(coords)
    407 if radius is None:
    408     dists, col_indices = tree.kneighbors()

File ~/.local/lib/python3.12/site-packages/sklearn/base.py:1474, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1467     estimator._validate_params()
   1469 with config_context(
   1470     skip_parameter_validation=(
   1471         prefer_skip_nested_validation or global_skip_validation
   1472     )
   1473 ):
-> 1474     return fit_method(estimator, *args, **kwargs)

File ~/.local/lib/python3.12/site-packages/sklearn/neighbors/_unsupervised.py:175, in NearestNeighbors.fit(self, X, y)
    154 @_fit_context(
    155     # NearestNeighbors.metric is not validated yet
    156     prefer_skip_nested_validation=False
    157 )
    158 def fit(self, X, y=None):
    159     """Fit the nearest neighbors estimator from the training dataset.
    160 
    161     Parameters
   (...)
    173         The fitted nearest neighbors estimator.
    174     """
--> 175     return self._fit(X)

File ~/.local/lib/python3.12/site-packages/sklearn/neighbors/_base.py:518, in NeighborsBase._fit(self, X, y)
    516 else:
    517     if not isinstance(X, (KDTree, BallTree, NeighborsBase)):
--> 518         X = self._validate_data(X, accept_sparse="csr", order="C")
    520 self._check_algorithm_metric()
    521 if self.metric_params is None:

File ~/.local/lib/python3.12/site-packages/sklearn/base.py:633, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
    631         out = X, y
    632 elif not no_val_X and no_val_y:
--> 633     out = check_array(X, input_name="X", **check_params)
    634 elif no_val_X and not no_val_y:
    635     out = _check_y(y, **check_params)

File ~/.local/lib/python3.12/site-packages/sklearn/utils/validation.py:1049, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
   1043     raise ValueError(
   1044         "Found array with dim %d. %s expected <= 2."
   1045         % (array.ndim, estimator_name)
   1046     )
   1048 if force_all_finite:
-> 1049     _assert_all_finite(
   1050         array,
   1051         input_name=input_name,
   1052         estimator_name=estimator_name,
   1053         allow_nan=force_all_finite == "allow-nan",
   1054     )
   1056 if copy:
   1057     if _is_numpy_namespace(xp):
   1058         # only make a copy if array and array_orig may share memory

File ~/.local/lib/python3.12/site-packages/sklearn/utils/validation.py:126, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    123 if first_pass_isfinite:
    124     return
--> 126 _assert_all_finite_element_wise(
    127     X,
    128     xp=xp,
    129     allow_nan=allow_nan,
    130     msg_dtype=msg_dtype,
    131     estimator_name=estimator_name,
    132     input_name=input_name,
    133 )

File ~/.local/lib/python3.12/site-packages/sklearn/utils/validation.py:175, in _assert_all_finite_element_wise(X, xp, allow_nan, msg_dtype, estimator_name, input_name)
    158 if estimator_name and input_name == "X" and has_nan_error:
    159     # Improve the error message on how to handle missing values in
    160     # scikit-learn.
    161     msg_err += (
    162         f"\n{estimator_name} does not accept missing values"
    163         " encoded as NaN natively. For supervised learning, you might want"
   (...)
    173         "#estimators-that-handle-nan-values"
    174     )
--> 175 raise ValueError(msg_err)

ValueError: Input X contains NaN.

Has anyone encountered a similar issue before? Any help or suggestions would be greatly appreciated!

Silaschuwen commented 3 weeks ago

Hi Henry, it works fine for me either on crop10 or the wholemap 10X CRC data. can you share your code when loading the full_data? According to the error, it seems to be the inability to construct spatial neighbors.

HenryJinZH commented 3 weeks ago

Hi Silaschuwen. I also tried download the test data from 10x website and it showed the same error. But in other 10x HD datasets I tested, HDST worked pretty well. Here's how I read in the full_data.


path = '/data2/crc_hd/Human_Colon_Cancer_P2/'
hd_data_square_002um_path = path+'square_002um/' 
hd_fullres_img_btf_path = path+'Visium_HD_Human_Colon_Cancer_P2_tissue_image.btf'
full_data = sthdio.STHD(
    spatial_path = hd_data_square_002um_path, counts_data = 'filtered_feature_bc_matrix.h5', full_res_image_path = hd_fullres_img_btf_path, 
    load_type = 'original')
HenryJinZH commented 3 weeks ago

It is caused by NaN in spatial coordinates, and removing them will do.


invalid_indices = np.where(np.isnan(full_data.adata.obsm['spatial']))[0]
full_data.adata= full_data.adata[~np.isin(range(full_data.adata.shape[0]), invalid_indices)]
yi-zhang commented 3 weeks ago

Hi Henry, it looks related to squidpy visium loading function. Could you kindly share your squidpy and scanpy version?

HenryJinZH commented 2 weeks ago

Hi Henry, it looks related to squidpy visium loading function. Could you kindly share your squidpy and scanpy version?

My Squidpy version is 1.6.0, and Scanpy version is 1.10.2.

yi-zhang commented 2 weeks ago

We found this is due to directly operating STHD on the level of full-size VisiumHD data, which triggers spatial neighbor errors in squidpy. Actually STHD training is designed for patch-level sizes post cropping - we have tested sizes up to 9000 x 9000 full-resolution pixels, about 1/11 of a full-size VisiumHD area. We're testing this limit to provide updates. Thank you!

yi-zhang commented 2 days ago

Tutorials for automatic patchifying followed by STHD modeling and patch merging, are available across notebooks/ s11_patchify.ipynb, s12_per_patch_train.ipynb, s13_combine_patch.ipynb.