Closed pascaltimshel closed 4 years ago
Problem: CELLEX crashes if data contains duplicate cell_ids. Solution: check for duplicated cell_ids before running function
data = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('ABCDD')) data.head() metadata = pd.DataFrame(data={"cell_type":["X","X","X", "Y", "Y"]}, index=data.columns) metadata.head() eso = cellex.ESObject(data=data, annotation=metadata, verbose=True) Preprocessing - running remove_non_expressed ... excluded 0 / 100 genes in 0 min 0 sec Preprocessing - normalizing data ... data normalized in 0 min 0 sec --------------------------------------------------------------------------- InvalidIndexError Traceback (most recent call last) <ipython-input-17-537009d482a9> in <module> 3 metadata = pd.DataFrame(data={"cell_type":["X","X","X", "Y", "Y"]}, index=data.columns) 4 metadata.head() ----> 5 eso = cellex.ESObject(data=data, annotation=metadata, verbose=True) /tools/anaconda/envs/djw472/py3_PT/lib/python3.7/site-packages/cellex/esobject.py in __init__(self, data, annotation, remove_non_expressed, normalize, anova, verbose) 53 54 if type(annotation) is pd.Series: ---> 55 annotation = data.columns.map(annotation, na_action="ignore").values.astype(str) 56 57 if anova: /tools/anaconda/envs/djw472/py3_PT/lib/python3.7/site-packages/pandas/core/indexes/base.py in map(self, mapper, na_action) 4872 from .multi import MultiIndex 4873 -> 4874 new_values = super()._map_values(mapper, na_action=na_action) 4875 4876 attributes = self._get_attributes_dict() /tools/anaconda/envs/djw472/py3_PT/lib/python3.7/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action) 1275 values = self.values 1276 -> 1277 indexer = mapper.index.get_indexer(values) 1278 new_values = algorithms.take_1d(mapper._values, indexer) 1279 /tools/anaconda/envs/djw472/py3_PT/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance) 2976 if not self.is_unique: 2977 raise InvalidIndexError( -> 2978 "Reindexing only valid with uniquely" " valued Index objects" 2979 ) 2980 InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Better solution: automatic renaming of duplicated cell_ids?
Fixed by @Satannius in 47845ba8a3cefdf1319344b96567e654cad9fe25
Problem: CELLEX crashes if data contains duplicate cell_ids. Solution: check for duplicated cell_ids before running function