linnarsson-lab / adolescent-mouse

Analysis pipeline for the adolescent mouse nervous system project
24 stars 5 forks source link

Should row_attrs["Gene"] be an array of type object? #1

Closed gioelelm closed 6 years ago

gioelelm commented 6 years ago

I am not sure how this remained unnoticed until now but loompy 1.1.0 is throwing an error on this line

np.where(npstr.startswith(ds.row_attrs["Gene"], "mt-"))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-32-153eb4af9234> in <module>()
----> 1 np.where(npstr.startswith(ds.row_attrs["Gene"], "mt-"))

~/anaconda3/lib/python3.6/site-packages/numpy/core/defchararray.py in startswith(a, prefix, start, end)
   1376     """
   1377     return _vec_string(
-> 1378         a, bool_, 'startswith', [prefix, start] + _clean_args(end))
   1379
   1380

TypeError: string operation on non-string array

The fact is that ds.row_attrs["Gene"] is of dtype=object

Now the question is:

Is this because of loompy specification or it should be an array of string? Basically I am asking if there is some bug at the level of file creation or loompy is right and I should be doing:

ds.row_attrs["Gene"].astype(str)
slinnarsson commented 6 years ago

It should be an array of string. Are you on the latest loompy? I think I caused and fixed this bug recently.

Be careful if you update adolescent_mouse as I think I might have introduced some dependencies on loompy2 features, and I'm working on those features in the loompy2 branch of loompy. I'm trying to keep it backwards compatible, but not sure.

gioelelm commented 6 years ago

I am, but my original 10X files were generated by an older version. So I guess the new version does not fix the bug on old files.

slinnarsson commented 6 years ago

Ah, the fix was only in the loompy2 branch. I merged it to master now and it should probably work on old files too.

gioelelm commented 6 years ago

Ok I will try.

Unrelated problem byt really annoying:

I keep getting this loompy/hdf5 error but only in my luigi pipeline. Howeverm when I try to read from ipython everything works smoothly!

Do you have a clue on what might be going on with my pipeline? Maybe you encountered something similar?

2017-11-18 16:11:55,087 ERROR: [pid 52637] Worker Worker(salt=749731879, workers=6, host=monod06.mbb.ki.se, username=gioele, pid=52604) failed    PrepareTissuePool(tissue=Midbrain_E16-18)
Traceback (most recent call last):
  File "/home/gioele/anaconda3/lib/python3.6/site-packages/luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "/home/gioele/anaconda3/lib/python3.6/site-packages/luigi/worker.py", line 129, in _run_get_new_deps
    task_gen = self.task.run()
  File "/home/gioele/Github/development-mouse/development_mouse/primary/prepare_tissue_pool.py", line 44, in run
    ds = loompy.connect(sample)
  File "/home/gioele/Github/loompy/loompy/loompy.py", line 1357, in connect
    return LoomConnection(filename, mode)
  File "/home/gioele/Github/loompy/loompy/loompy.py", line 249, in __init__
    raise e
  File "/home/gioele/Github/loompy/loompy/loompy.py", line 235, in __init__
    for key in self._file['col_attrs'].keys():
  File "/home/gioele/anaconda3/lib/python3.6/_collections_abc.py", line 720, in __iter__
    yield from self._mapping
  File "/home/gioele/anaconda3/lib/python3.6/site-packages/h5py/_hl/group.py", line 307, in __iter__
    for x in self.id.__iter__():
  File "h5py/h5g.pyx", line 452, in h5py.h5g.GroupID.__iter__ (/home/ilan/minonda/conda-bld/h5py_1496889914775/work/h5py/h5g.c:5736)
  File "h5py/h5g.pyx", line 453, in h5py.h5g.GroupID.__iter__ (/home/ilan/minonda/conda-bld/h5py_1496889914775/work/h5py/h5g.c:5692)
  File "h5py/h5g.pyx", line 99, in h5py.h5g.GroupIter.__init__ (/home/ilan/minonda/conda-bld/h5py_1496889914775/work/h5py/h5g.c:2295)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1496889914775/work/h5py/_objects.c:2846)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/h5py_1496889914775/work/h5py/_objects.c:2804)
  File "h5py/h5g.pyx", line 321, in h5py.h5g.GroupID.get_num_objs (/home/ilan/minonda/conda-bld/h5py_1496889914775/work/h5py/h5g.c:4395)
RuntimeError: Can't determine (Bad symbol table node signature)
gioelelm commented 6 years ago

I merged it to master now and it should probably work on old files too.

Probably you did not sync correctly, I still see two different branches.

slinnarsson commented 6 years ago

I just fixed the one bug on master. The loompy2 branch contains a major refactoring and is not ready for use yet.