labsyspharm / scimap

Spatial Single-Cell Analysis Toolkit
https://scimap.xyz/
MIT License
71 stars 24 forks source link

TypeError: Feature names are only supported if all input features have string names, but your input has ['str', 'str_'] when running sm.tl.spatial_cluster #107

Closed emmanuel-contreras closed 1 month ago

emmanuel-contreras commented 1 month ago

I've been working through the tutorial on Spatial LDA and ran into the error below (and in title), which could be a bug? I am using scimap 2.1.1 (https://scimap.xyz/tutorials/md/spatial_lda_scimap/)

I noticed that the "Unknown" phenotype is being added as np.str_ which is why this error is being thrown. Screenshot 2024-07-11 145511

I tried recasting all obs to the same dtype like this adata.obs['phenotype'] = adata.obs['phenotype'].astype(str) but it did not work.

I traced where this was coming from to this line: https://github.com/labsyspharm/scimap/blob/3e68c793ebe50bb28c087d45618b28bca2b4f92d/scimap/tools/phenotype_cells.py#L312

I updated the line to create a same size list of Unknowns using a list instead to fix this issue.

d[label] = d[label].replace(dict(zip(fail, ['Unknown'] * len(fail) )))

Now all phenotypes have the same class Screenshot 2024-07-11 145909

stack trace

adata = sm.tl.spatial_cluster(adata, df_name='spatial_count', method='kmeans', k=6, label='neigh_kmeans')
Kmeans clustering
Traceback (most recent call last):

  Cell In[15], line 1
    adata = sm.tl.spatial_cluster(adata, df_name='spatial_count', method='kmeans', k=6, label='neigh_kmeans')

  File ~\Anaconda3\envs\scimap\lib\site-packages\scimap\tools\spatial_cluster.py:176 in spatial_cluster
    adata_new = cluster (adata = adata_new,

  File ~\Anaconda3\envs\scimap\lib\site-packages\scimap\tools\cluster.py:299 in cluster
    all_cluster_labels = k_clustering(pheno=None, adata=bdata, k=k, sub_cluster_column=sub_cluster_column, use_raw=use_raw, random_state=random_state)

  File ~\Anaconda3\envs\scimap\lib\site-packages\scimap\tools\cluster.py:213 in k_clustering
    kmeans = KMeans(n_clusters=k, random_state=random_state, n_init=10).fit(data_subset)

  File ~\Anaconda3\envs\scimap\lib\site-packages\sklearn\base.py:1473 in wrapper
    return fit_method(estimator, *args, **kwargs)

  File ~\Anaconda3\envs\scimap\lib\site-packages\sklearn\cluster\_kmeans.py:1464 in fit
    X = self._validate_data(

  File ~\Anaconda3\envs\scimap\lib\site-packages\sklearn\base.py:608 in _validate_data
    self._check_feature_names(X, reset=reset)

  File ~\Anaconda3\envs\scimap\lib\site-packages\sklearn\base.py:469 in _check_feature_names
    feature_names_in = _get_feature_names(X)

  File ~\Anaconda3\envs\scimap\lib\site-packages\sklearn\utils\validation.py:2279 in _get_feature_names
    raise TypeError(

TypeError: Feature names are only supported if all input features have string names, but your input has ['str', 'str_'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.
ajitjohnson commented 1 month ago

Thank you for identifying this bug @emmanuel-contreras. Would you be able to submit a PR? Thank you very much.