theislab / scib

Benchmarking analysis of data integration tools
MIT License
298 stars 63 forks source link

run scgen via runIntegration.py failed #13

Closed mbuttner closed 5 years ago

mbuttner commented 5 years ago

Hi,

I executed the scgen data integration using the following command:

./runIntegration.py -i /storage/groups/ml01/workspace/maren.buettner/data_integration/data/human_pancreas/human_pancreas_norm.h5ad -o /storage/groups/ml01/workspace/maren.buettner/data_integration/data/human_pancreas/integrated/human_pancreas_scgen.h5ad -b tech -v -m scgen > /storage/groups/ml01/workspace/maren.buettner/data_integration/data/human_pancreas/integrated/human_pancreas_scgen.out

and obtained this error message:

Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/icb/daniel.strobl/miniconda3/envs/sc-tutorial/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2890, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'louvain'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./runIntegration.py", line 57, in <module>
    runIntegration(file, out, run, hvg, batch)
  File "./runIntegration.py", line 16, in runIntegration
    integrated_tmp = scIB.metrics.measureTM(method, adata, batch)
  File "/home/icb/daniel.strobl/Benchmarking_data_integration/scIB/metrics.py", line 580, in measureTM
    out = memory_profiler.memory_usage((prof.runcall, args, kwargs), retval=True) 
  File "/home/icb/daniel.strobl/miniconda3/envs/sc-tutorial/lib/python3.7/site-packages/memory_profiler.py", line 336, in memory_usage
    returned = f(*args, **kw)
  File "/home/icb/daniel.strobl/miniconda3/envs/sc-tutorial/lib/python3.7/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "/home/icb/daniel.strobl/Benchmarking_data_integration/scIB/integration.py", line 43, in runScGen
    adata.obs['cell_type'] = adata.obs[cell_type].copy()
  File "/home/icb/daniel.strobl/miniconda3/envs/sc-tutorial/lib/python3.7/site-packages/pandas/core/frame.py", line 2975, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/icb/daniel.strobl/miniconda3/envs/sc-tutorial/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2892, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'louvain'

The thing is that my data set does not contain the 'cell_type' field, so can you add in the runScGen function something like

# save cell_types for later
if adata.obs[cell_type] is not None:
    cell_types = adata.obs[cell_type].copy()

batches = adata.obs[batch].copy()

? Thanks!

kridsadakorn commented 5 years ago

I think it would be better to include louvain clustering in runScGen. Because most of input AnnData objects don't contain 'louvain'

kridsadakorn commented 5 years ago

I just added the louvain clustering in runScGen. It needs to be verified

LuckyMD commented 5 years ago

Why would you add louvain clustering to scGen? I'm not aware that scGen requires clusters as input. I feel like it shouldn't...

@M0hammadL ?

LuckyMD commented 5 years ago

scGen 2.0 will be used, instead of scGen 1.0. This does not require cell type info.