jianhuupenn / SpaGCN

SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network
MIT License
191 stars 59 forks source link

target_cluster, x/y_pixel and nbr_domians in Identify SVGs section #40

Closed lifan18 closed 10 months ago

lifan18 commented 2 years ago

Hi Dr. Hu,

Sorry for new questions. I am running SpaGCN for identifying SVGs using my spatial data.

There are some troubles stopped me to get through this section.

  1. In the example,

    #Use domain 0 as an example
    target=0

    I was trying to use Seurat cluster results named RG1, oRG and EX etc., could it be possible to use letters to define domains instead of numbers?

  2. The x_pixel and y_pixel parameters, however, my data only includes x_array and y_array. From my observation, these two parameters only appear once, should I just ignore these two pixel settings in SVGs section?

  3. nbr_domians detection, I ran these code below and got some errors.

    
    x_array=raw.obs["x_array"].tolist()
    y_array=raw.obs["y_array"].tolist()
    #X=np.array([x_array, y_array]).T.astype(np.float32)
    #from scipy.spatial import distance
    #adj_2d=distance.cdist(X, X, 'euclidean') #adjmatrix
    print(len(x_array),len(y_array))
    adj_2d=spg.calculate_adj_matrix(x=x_array, y=y_array, histology=False)
    start, end= np.quantile(adj_2d[adj_2d!=0],q=0.001), np.quantile(adj_2d[adj_2d!=0],q=0.1)
    r=spg.search_radius(target_cluster=target, cell_id=adata.obs.index.tolist(), x=x_array, y=y_array, pred=adata.obs["group"].tolist(), start=start, end=end, num_min=10, num_max=14,  max_run=100)
    print(len(target_cluster),len(cell_id),len(start),len(end))

Detect neighboring domains

nbr_domians=spg.find_neighbor_clusters(target_cluster=target, cell_id=raw.obs.index.tolist(), x=raw.obs["x_array"].tolist(), y=raw.obs["y_array"].tolist(), pred=raw.obs["pred"].tolist(), radius=r, ratio=1/2)

See errors below.

Traceback (most recent call last): File "w11_1_2_noreduce_SVGs.py", line 58, in r=spg.search_radius(target_cluster=target, cell_id=adata.obs.index.tolist(), x=x_array, y=y_array, pred=adata.obs["group"].tolist(), start=start, end=end, num_min=10, num_max=14, max_run=100) File "/usr/nzx-cluster/apps/SpaGCN/1.2.5/lib/python3.7/site-packages/SpaGCN/util.py", line 111, in search_radius num_low=count_nbr(target_cluster,cell_id, x, y, pred, start) File "/usr/nzx-cluster/apps/SpaGCN/1.2.5/lib/python3.7/site-packages/SpaGCN/util.py", line 97, in count_nbr df = pd.DataFrame(data=df) File "/usr/nzx-cluster/apps/SpaGCN/1.2.5/lib/python3.7/site-packages/pandas/core/frame.py", line 614, in init mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager) File "/usr/nzx-cluster/apps/SpaGCN/1.2.5/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 465, in dict_to_mgr arrays, data_names, index, columns, dtype=dtype, typ=typ, consolidate=copy File "/usr/nzx-cluster/apps/SpaGCN/1.2.5/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 119, in arrays_to_mgr index = _extract_index(arrays) File "/usr/nzx-cluster/apps/SpaGCN/1.2.5/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 635, in _extract_index raise ValueError("All arrays must be of the same length") ValueError: All arrays must be of the same length

See the related `python` codes.
632         if have_raw_arrays:
633             lengths = list(set(raw_lengths))
634             if len(lengths) > 1:
635                 raise ValueError("All arrays must be of the same length")

I also checked my data as the following.

cell_id=raw.obs.index.tolist() x=raw.obs["x_array"].tolist() y=raw.obs["y_array"].tolist() print(len(cell_id), len(x), len(y)) 91019 91019 91019


It seems the lengths of arrays are equal. I guess it happed on the `nbr_domians`and `r` calculation,  could you give me some advices to find a solution for these issue?

Thank you very much!

Best

jianhuupenn commented 2 years ago

Hi there. Thanks for your interest in SpaGCN. Regarding your questions:

  1. Yes, you can string or categorical variables as you like. But I would still suggest to recode the "label" column to numbers to avoid potential errors.
  2. x_pixel and y_piexl are used to map each spot back onto the histology image. You will not be able to utilize the image if you don't have them. In tutorial, there is an option to run SpaGCN without an image.
  3. I do not know what exactly causes this error. Can you try a different radius? I'll dig into the code from my side.
lifan18 commented 2 years ago

Hi Dr. Hu,

Thank you very much for your reply! Really appreciate!

For the 3rd question above, I fixed the equal length of all arrays, however, it still has a trouble in radius calculation. See my codes as below.

adj_2d=spg.calculate_adj_matrix(x=x_array, y=y_array, histology=False)
start, end= np.quantile(adj_2d[adj_2d!=0],q=0.5), np.quantile(adj_2d[adj_2d!=0],q=0.1)
print(start)
print(end)
r=spg.search_radius(target_cluster=target, cell_id=raw.obs.index.tolist(), x=x_array, y=y_array, pred=raw.obs["pred"].tolist(), start=start, end=end, num_min=1, num_max=14,  max_run=100)
#r=2.8726212047040462
#Detect neighboring domains
nbr_domians=spg.find_neighbor_clusters(target_cluster=target,
                                   cell_id=raw.obs.index.tolist(),
                                   x=raw.obs["x_array"].tolist(),
                                   y=raw.obs["y_array"].tolist(),
                                   pred=raw.obs["pred"].tolist(),
                                   radius=r,
                                   ratio=1/2)

The error reported as below.

Traceback (most recent call last):
  File "w11_1_2_noreduce_SVGs.py", line 74, in <module>
    ratio=1/2)
  File "/usr/nzx-cluster/apps/SpaGCN/1.2.5/lib/python3.7/site-packages/SpaGCN/util.py", line 157, in find_neighbor_clusters
    tmp_nbr=df[((df["x"]-x)**2+(df["y"]-y)**2)<=(radius**2)]
TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'

May you have tips on this trouble? Is the r calculation only for 10X chips?

Thank you!

Regards