raphael-group / GASTON

BSD 3-Clause "New" or "Revised" License
16 stars 4 forks source link

error in 'parse_adata.get_gaston_input_adata()' #3

Closed yao50985098 closed 2 months ago

yao50985098 commented 5 months ago

Hi, thank you for the nice tool! I'm trying to run my own visium data with standard Space Ranger output(filtered_feature_bc_matrix.h5 and spatial/tissue_positions_list.csv are provided), but got this error. 截屏2024-04-28 12 10 22

gaston-package/lib/python3.11/site-packages/anndata/_core/anndata.py:1840: UserWarning: Variable names are not unique. To make them unique, call .var_names_make_unique. utils.warn_names_duplicates("var") gaston-package/lib/python3.11/site-packages/anndata/_core/anndata.py:1113: FutureWarning: is_categorical_dtype is deprecated and will be removed in a future version. Use isinstance(dtype, CategoricalDtype) instead if not is_categorical_dtype(df_full[k]): gaston-package/lib/python3.11/site-packages/anndata/_core/anndata.py:1840: UserWarning: Variable names are not unique. To make them unique, call .var_names_make_unique. utils.warn_names_duplicates("var")

AssertionError Traceback (most recent call last) Cell In[9], line 4 2 #use_RGB=True # set to False if you do not want to use RGB as features 3 use_RGB=False ----> 4 counts_mat, coords_mat, gene_labels, rgb_mean=parse_adata.get_gaston_input_adata(data_folder, get_rgb=use_RGB, spot_umi_threshold=50) 6 # save matrices 7 np.save('colorectal_tumor_data/counts_mat.npy', counts_mat)

File ~/GASTON-main/src/gaston/parse_adata.py:11, in get_gaston_input_adata(data_folder, get_rgb, spot_umi_threshold) 9 df_pos = pd.read_csv(f'{data_folder}/spatial/tissue_positions_list.csv', sep=",", header=None, names=["barcode", "in_tissue", "array_row", "array_col", "pxl_row_in_fullres", "pxl_col_in_fullres"]) 10 df_pos = df_pos[df_pos.in_tissue == True] ---> 11 assert set(list(df_pos.barcode)) == set(list(adata.obs.index)) 12 df_pos.barcode = pd.Categorical(df_pos.barcode, categories=list(adata.obs.index), ordered=True) 13 df_pos.sort_values(by="barcode", inplace=True)

AssertionError:

Is there any advice? Thank you!

yao50985098 commented 5 months ago

I solve this by deleting the column name in the 'tissue_positions_list.csv'. But I'm facing another error, 截屏2024-04-28 16 09 33 When running 'binning_and_plotting.bin_data()', it seems the 'pseudo_counts_mat' has an index which does not correspond to the 'idx_kept' generated by 'idx_kept, gene_labels_idx=filter_genes.filter_genes(counts_mat, gene_labels,umi_threshold=umi_thresh, exclude_prefix=['MT-', 'RPL', 'RPS'])'.

I tried different samples, but got similar error. Is there any advice? Thank you!

uthsavc commented 5 months ago

Hi! Thanks for your interest in GASTON. Sorry for my late reply.

For your first point, which column did you have to remove? I probably need to update the code to use a standard Visium functions for reading 10x output.

For the second point, I updated the code to use an NxG counts matrix instead of a GxN matrix. However I don't think I updated the tutorial. Could you try using counts_mat_restrict.T instead in the binning_and_plotting function, ie use the transpose? Please let me know if this works.

yao50985098 commented 4 months ago

Yes, 'counts_mat_restrict.T' works, thanks! In my output, there's an extra headline in 'tissue_positions_list.csv' like this: image Thanks again for your kind reply.

uthsavc commented 2 months ago

I just fixed the bug in binning_and_plotting so now you should not have to use counts_mat_restrict.T. Also I updated the code to use SquidPy to read the 10x output, so hopefully your tissue_positions_list.csv bug is fixed.