imsb-uke / nichepca

MIT License
1 stars 1 forks source link

One hot encoding #2

Open pakiessling opened 1 month ago

pakiessling commented 1 month ago

Description of feature

Hi Darius,

was nice meeting in person :)

I am trying out the one-hot encoding we talked about.

import pandas as pd
import numpy as np

cell_types = adata.obs['cell_type_tmp']
one_hot = pd.get_dummies(cell_types,dtype=np.int8)
one_hot_array = one_hot.values
adata_one_hot = ad.AnnData(one_hot_array, obs=adata.obs)
adata_one_hot.obsm["spatial"] = adata.obsm["spatial"]

Do I now just run nichepca on this?

dschaub95 commented 1 month ago

Hi Paul,

yes exactly! In our experiments, it worked better for multi-slide integration, but then you need to omit the harmony part. I would suggest to just copy the relevant parts from the nichepca function and implement it yourself. I did not have time yet to adapt the nichepca function. You only need to have these lines:

def run_nichepca(
    adata: AnnData,
    knn: int = None,
    radius: float = None,
    sample_key: str = None,
    n_comps: int = 30,
    **kwargs,
):

  if sample_key is not None:
          construct_multi_sample_graph(
              adata, sample_key=sample_key, knn=knn, radius=radius, **kwargs
          )
      else:
          if knn is not None:
              knn_graph(adata, knn, **kwargs)
          elif radius is not None:
              distance_graph(adata, radius, **kwargs)
          else:
              raise ValueError("Either knn or radius must be provided.")

  aggregate(adata)

  sc.tl.pca(adata, n_comps=n_comps)
pakiessling commented 1 month ago

Thanks, I will give it a shot

pakiessling commented 1 month ago
construct_multi_sample_graph(adata, sample_key="sample", knn=5)
aggregate(adata)
rsc.get.anndata_to_GPU(adata)
rsc.tl.pca(adata, n_comps=5)
rsc.pp.neighbors(adata)  
rsc.tl.leiden(adata, resolution=0.1, key_added="nichepca_0.1")
rsc.tl.leiden(adata, resolution=0.5, key_added="nichepca_0.5")
rsc.tl.leiden(adata, resolution=0.3, key_added="nichepca_0.3")
rsc.tl.leiden(adata, resolution=0.8, key_added="nichepca_0.8")

I did this and got more than 6000 cluster for all of the resolutions 😅

Do you know that could cause this?

dschaub95 commented 1 month ago

Hi Paul,

sorry for the late reply. I think it might be caused by the low number of knn, which might lead to many similar neighborhood compositions. What happens if you run it with say knn=20 and 30 comps?

Best Darius