lilab-bcb / harmony-pytorch

BSD 3-Clause "New" or "Revised" License
47 stars 9 forks source link

Error when using Scanpy #7

Closed dmiyagi closed 1 year ago

dmiyagi commented 1 year ago

``Hi there,

I have been encountering an error that I can't seem to figure out. I am using scanpy and using this on spatial data.

`from harmony import harmonize

Z = harmonize(adata.obsm["X_pca"], adata.obs, batch_key = 'batch')`

I have adata.obs that looks like this: `

batch sample grade GliomaType
B1 YALE-NS-0680 2.0 astrocytoma
B1 YALE-NS-0680 2.0 astrocytoma
B1 YALE-NS-0680 2.0 astrocytoma
B1 YALE-NS-0680 2.0 astrocytoma
B1 YALE-NS-0680 2.0 astrocytoma
... ... ... ...
B2 YALE-NS-1209 3.0 astrocytoma
B2 YALE-NS-1209 3.0 astrocytoma
B2 YALE-NS-1209 3.0 astrocytoma
B2 YALE-NS-1209 3.0 astrocytoma
B2 YALE-NS-1209 3.0 astrocytoma

35463 rows × 4 columns

` and has 35463 rows × 4 columns

my adata object looks like this: AnnData object with n_obs × n_vars = 35463 × 36601 obs: 'batch', 'sample', 'grade', 'GliomaType' uns: 'log1p', 'neighbors', 'pca', 'umap' obsm: 'X_pca', 'X_umap' varm: 'PCs' obsp: 'connectivities', 'distances'

you can see that the n_obs matches in both cases; however, I get this error when using any other column than "sample"

`--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[82], line 5 1 from harmony import harmonize 3 #data_np = df.to_numpy() ----> 5 Z = harmonize(adata.obsm["X_pca"], adata.obs, batch_key = 'batch')

File /gpfs/ysm/project/gunel/dfm42/conda_envs/spatial_2023_2/lib/python3.8/site-packages/harmony/harmony.py:144, in harmonize(X, batch_mat, batch_key, n_clusters, max_iter_harmony, max_iter_clustering, tol_harmony, tol_clustering, ridge_lambda, sigma, block_proportion, theta, tau, correction_method, random_state, use_gpu, n_jobs, verbose) 137 N_b = torch.tensor( 138 batch_codes.value_counts(sort=False).values, 139 dtype=torch.float, 140 device=device_type, 141 ) 142 Pr_b = N_b.view(-1, 1) / n_cells --> 144 Phi = one_hot_tensor(batch_codes, device_type) 146 if n_clusters is None: 147 n_clusters = int(min(100, n_cells / 30))

File /gpfs/ysm/project/gunel/dfm42/conda_envs/spatial_2023_2/lib/python3.8/site-packages/harmony/utils.py:23, in one_hot_tensor(X, device_type) 21 n_col = X.cat.categories.size 22 Phi = torch.zeros(n_row, n_col, dtype=torch.float, device = devicetype) ---> 23 Phi.scatter(dim=1, index=ids, value=1.0) 25 return Phi

RuntimeError: index -1 is out of bounds for dimension 1 with size 3`

Harmony runs fine when using the "sample" column, but not with any other column. I need it to work with the "batch" column. the only thing I can think of is that the "sample" column is the only one that has a different value per sample wheres the others are shared i.e. I only have 3 batches for 11 samples but of course there are 11 sample names for 11 samples.

I have attached a file that has what's loaded into my environment.

thanks in advance

reqs.txt

dmiyagi commented 1 year ago

I realized that my metadata file had an issue where the batch ID was not assigned to all the cells correctly. Once I fixed this I was able to resolve the issue. Sorry, I thought I had checked for all the possible issues like this before opening this request.

yihming commented 1 year ago

@dmiyagi Glad to hear that you figured out the issue by yourself.

To help improve the error message in future use, may I know more specific about your issue? Is it because not all the cells have batch IDs associated?

dmiyagi commented 1 year ago

Hi @yihming sorry for the late response! Yes that is what happened. there were some blanks in the "batch" which caused the error, which wasn't true for the "sample" column.

I do have another question, which I'm happy to start with another thread, but if I'm wanting to pass the 'X_harmony" array downstream to other scanpy external packages like scanpy.external.pp.magic (https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.pp.magic.html) or scanpy.external.tl.phate (https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.tl.phate.html) how would I convert it to take the place of adata.X? sort of like how correct_scanpy() works in Scanorama (https://github.com/brianhie/scanorama) ? Thank you!!

I