pinellolab / STREAM

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
http://stream.pinellolab.org
GNU Affero General Public License v3.0
168 stars 45 forks source link

read in loom and get UnicodeDecodeError #107

Closed crazyhottommy closed 3 years ago

crazyhottommy commented 3 years ago

Hi, When I read in the loom file

adata = ad.read_loom("mydata.loom")
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-903c3e692f43> in <module>
----> 1 adata = ad.read_loom("cd3_minus_B_cells_sub_5000cells.loom")

~/anaconda3/envs/stream/lib/python3.6/site-packages/anndata/readwrite/read.py in read_loom(filename, sparse, cleanup, X_name, obs_names, var_names, dtype, **kwargs)
    165             if key != '': layers[key] = lc.layers[key].sparse().T.tocsr() if sparse else lc.layers[key][()].T
    166 
--> 167         obs = dict(lc.col_attrs)
    168         if obs_names in obs.keys(): obs['obs_names'] = obs.pop(obs_names)
    169         obsm_attrs = [k for k, v in obs.items() if v.ndim > 1 and v.shape[1] > 1]

~/anaconda3/envs/stream/lib/python3.6/site-packages/loompy/attribute_manager.py in __getitem__(self, thing)
    100                                 return result
    101                 else:
--> 102                         return self.__getattr__(thing)
    103 
    104         def __getattr__(self, name: str) -> np.ndarray:

~/anaconda3/envs/stream/lib/python3.6/site-packages/loompy/attribute_manager.py in __getattr__(self, name)
    117                                 # Read values from the HDF5 file
    118                                 a = ["/row_attrs/", "/col_attrs/"][self.axis]
--> 119                                 vals = loompy.materialize_attr_values(self.ds._file[a][name][:])
    120                                 self.__dict__["storage"][name] = vals
    121                         return vals

~/anaconda3/envs/stream/lib/python3.6/site-packages/loompy/normalize.py in materialize_attr_values(a)
     96                         temp = a
     97                 # Then unescape XML entities and convert to unicode
---> 98                 result = np.array([html.unescape(x) for x in temp.astype(str)], dtype=object)
     99         elif np.issubdtype(a.dtype, np.str_) or np.issubdtype(a.dtype, np.unicode_):
    100                 result = np.array(a.astype(str), dtype=object)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)

It is a different subset of the Seurat object I converted to loom. I used the same code. The previous loom file can be read in without problems.

I googled and found https://stackoverflow.com/questions/10406135/unicodedecodeerror-ascii-codec-cant-decode-byte-0xd1-in-position-2-ordinal

How can I fix this?

Thanks!

huidongchen commented 3 years ago

Unfortunately this is something new to me as well and I have no clue how to solve it. Since read_loom() is the function from the package Anndata (https://github.com/theislab/anndata), I guess Anndata developers might be more helpful in this case :)

crazyhottommy commented 3 years ago

oh, let me open an issue there. thanks!