How to fill missing genes based on tranSpa?

HelloWorldLTY commented 1 year ago

Hi, I notice that tranSpa can be used to fill the missing genes ( for example, genes only in scrna-seq, not sequenced by spatial transcriptomic data). But if I set test genes as all genes from scrnas-eq data, I got such error:

KeyError                                  Traceback (most recent call last)
Cell In[26], line 1
----> 1 transImpRes = expTransImp(
      2                 df_ref=raw_scrna_df,
      3                 df_tgt=raw_spatial_df,
      4                 train_gene=spatial_data.var_names,
      5                 test_gene=scrna_adata.var_names,
      6                 n_simulation=200,
      7                 signature_mode='cell',
      8                 mapping_mode='lowrank',
      9                 classes=classes,
     10                 n_epochs=2000,
     11                 seed=seed,
     12                 device=device
     13 )

File ~/.conda/envs/tangram/lib/python3.8/site-packages/transpa/util.py:485, in expTransImp(df_ref, df_tgt, train_gene, test_gene, classes, ct_list, autocorr_method, signature_mode, mapping_mode, mapping_lowdim, spa_adj, lr, weight_decay, n_epochs, clip_max, wt_spa, wt_l1norm, wt_l2norm, locations, n_simulation, convert_uncertainty_score, device, seed)
    431 def expTransImp(
    432              df_ref: pd.DataFrame, 
    433              df_tgt: pd.DataFrame, 
   (...)
    453              device: torch.device=None,
    454              seed: int=None):
    455     """Main function for transimp
    456 
    457     Args:
   (...)
    483         list: results
    484     """
--> 485     model, train_X, train_y, test_X = fit_transImp(
    486                                         df_ref, df_tgt,
    487                                         train_gene, test_gene,
    488                                         lr, weight_decay, n_epochs,
    489                                         classes,
    490                                         ct_list,
    491                                         autocorr_method, 
    492                                         mapping_mode,
    493                                         mapping_lowdim,
    494                                         spa_adj,
    495                                         clip_max=clip_max,
    496                                         signature_mode=signature_mode,
    497                                         wt_spa=wt_spa,
    498                                         wt_l1norm=wt_l1norm,
    499                                         wt_l2norm=wt_l2norm,
    500                                         locations=locations,
    501                                         device=device,
    502                                         seed=seed)
    503     with torch.no_grad():
    504         model.eval()

File ~/.conda/envs/tangram/lib/python3.8/site-packages/transpa/util.py:368, in fit_transImp(df_ref, df_tgt, train_gene, test_gene, lr, weight_decay, n_epochs, classes, ct_list, autocorr_method, mapping_mode, mapping_lowdim, spa_adj, clip_max, signature_mode, wt_spa, wt_l1norm, wt_l2norm, locations, device, seed)
    366     test_X = tensify(test_X, device)
    367 else:
--> 368     test_X = tensify(df_ref[test_gene].values, device)
    370 return model, X, Y, test_X

File ~/.conda/envs/tangram/lib/python3.8/site-packages/pandas/core/frame.py:3767, in DataFrame.__getitem__(self, key)
   3765     if is_iterator(key):
   3766         key = list(key)
-> 3767     indexer = self.columns._get_indexer_strict(key, "columns")[1]
   3769 # take() does not accept boolean indexers
   3770 if getattr(indexer, "dtype", None) == bool:

File ~/.conda/envs/tangram/lib/python3.8/site-packages/pandas/core/indexes/base.py:5877, in Index._get_indexer_strict(self, key, axis_name)
   5874 else:
   5875     keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 5877 self._raise_if_missing(keyarr, indexer, axis_name)
   5879 keyarr = self.take(indexer)
   5880 if isinstance(key, Index):
   5881     # GH 42790 - Preserve name from an Index

File ~/.conda/envs/tangram/lib/python3.8/site-packages/pandas/core/indexes/base.py:5941, in Index._raise_if_missing(self, key, indexer, axis_name)
   5938     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   5940 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 5941 raise KeyError(f"{not_found} not in index")

KeyError: "['SAMD11', 'NOC2L', 'KLHL17', ']... such keys are not in the index.

Could you please help me address this problem? Thanks.

qiaochen commented 1 year ago

Thank you for your feedback. Yes, it can impute missing ST genes, as long as they are in the single cell reference.

From the error message, the program seems unable to find the genes ['SAMD11', 'NOC2L', 'KLHL17', '] in columns of raw_scrna_df, but they seem to be in scrna_adata.var_names as test genes to be imputed.

Could you double check if all the scrna_adata.var_names genes are in raw_scrna_df? Maybe the following code will print 0 if genes in the former is a subset of the latter.

import numpy as np
len(np.setdiff1d(scrna_adata.var_names, raw_scrna_df.columns))

HelloWorldLTY commented 1 year ago

Thanks, you are right. After generating overlap genes between these two modalities, I successfully run the code.

qiaochen / tranSpa

How to fill missing genes based on tranSpa? #2