mattjj / pyhsmm

MIT License
546 stars 173 forks source link

"Unexpected keyword argument" and "ndarray is not C contiguous" #29

Closed UserAB1236872 closed 9 years ago

UserAB1236872 commented 9 years ago

I'm trying to use this for some speaker diarization. I started getting some errors, I checked the examples to make sure it's not just my code. Here's a couple tracebacks:

Traceback (most recent call last):
  File "concentration-resampling.py", line 45, in <module>
    posteriormodel.resample_model()
  File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 201, in resample_model
    self.resample_parameters(joblib_jobs=obs_jobs)
TypeError: resample_parameters() got an unexpected keyword argument 'joblib_jobs'

If I delete the parameter on that line, I get another error (included at the bottom because it's long). Since it was related to pybasicbayes I tried reverting to old commits, but none of them seem to work.

Any ideas? I'm on Linux, as far as I know all my libraries are freshly installed and up to date.

Traceback (most recent call last): File "concentration-resampling.py", line 45, in posteriormodel.resample_model() File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 200, in resample_model self.resample_parameters() File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 706, in resample_parameters super(_HSMMGibbsSampling,self).resample_parameters() File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 205, in resample_parameters self.resample_trans_distn() File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 217, in resample_trans_distn self.trans_distn.resample([s.stateseq for s in self.states_list]) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/internals/transitions.py", line 374, in resample self._resample_beta(ms) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/internals/transitions.py", line 382, in _resample_beta self.beta_obj.resample(ms) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 2059, in resample self.alpha_0_obj.resample(counts) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 3225, in resample return super(GammaCompoundDirichlet,self).resample(data,niter=niter) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 3141, in resample a_n, b_n = self._posterior_hypparams(*self._get_statistics(data)) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 3237, in _get_statistics m = sample_crp_tablecounts(self.concentration,counts,self.weighted_cols) File "pyhsmm/basic/pybasicbayes/util/cstats.pyx", line 50, in pyhsmm.basic.pybasicbayes.util.cstats.sample_crp_tablecounts (basic/pybasicbayes/util/cstats.c:7911) File "pyhsmm/stringsource", line 614, in View.MemoryView.memoryview_cwrapper (basic/pybasicbayes/util/cstats.c:15841) File "pyhsmm/stringsource", line 321, in View.MemoryView.memoryview.cinit (basic/pybasicbayes/util/cstats.c:12387) ValueError: ndarray is not C-contiguous

UserAB1236872 commented 9 years ago

Update: this is happening on my mac too.

Update 2: At least the second error is only a problem with concentration_resampling and using alpha_a_0 and such instead of just alpha and gamma.

mattjj commented 9 years ago

Any idea which array isn't C-contiguous there? You can check by running pdb and printing out the ndarray flags for each of the arguments to sample_crp_tablecounts, as in

print counts.flags
print self.weighted_cols.flags

The quickest fix is probably just to add a copy, though it'd be better to know which is not contiguous.

mattjj commented 9 years ago

Can you try b5e696e on pyhsmm (master is currently pointing to it) and 91e42a2 on pybasicbayes and let me know what happens?

UserAB1236872 commented 9 years ago

Sure, right now I'm running a computation (ETA is about 10 minutes), then I'll do that.

UserAB1236872 commented 9 years ago

Doesn't seem to work on the updates. (It fixes the first problem, but not the C-contiguous one) PDB reveals:

counts.flags C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False

So it looks like counts

mattjj commented 9 years ago

Can you double-check that you're on 91e42a2 in pybasicbayes? This line should have made counts C-contiguous.

Also, can you verify that the "resample_parameters() got an unexpected keyword argument 'joblib_jobs'" error went away?

UserAB1236872 commented 9 years ago

I can verify both the current commit is correct and that resample_parameters went away.

I actually added a print there earlier before I filed the issue. When you enter the function, counts.flags is fine, which is kind of magic and strange.

mattjj commented 9 years ago

I'm confused... it starts C-contiguous, but then it changes? If that's what you're saying, can you use pdb to narrow down when it changes, or maybe add some assertions?

I'm unable to reproduce the problem on my end; the example concentration-resampling.py runs without encountering this issue.

mattjj commented 9 years ago

What's your cython version? You can get it wither with cython --version in your shell or import cython; print cython.__version__ in python.

UserAB1236872 commented 9 years ago

This is the weirdest thing:

    counts = np.array(data,ndmin=2,order='C')
    weighted_cols = np.array(self.weighted_cols,order='C')

    assert(counts.flags['C_CONTIGUOUS'])

Fails. Inexplicably. It succeeds the first time through, second time it dies.

My cython version is 0.21.1.

mattjj commented 9 years ago

That's the cython I'm using.

Can you try adding a copy=True argument to that, or even swapping in

counts = np.copy(np.array(data,ndmin=2),order='C')

With a bug this mysterious, it could be out-of-bounds memory writing, but I haven't seen any evidence of that elsewhere. And it seems improbable that the same error would happen reproducibly in that case.

Another thing to try is using the Python implementation of sample_crp_tablecounts in util/stats.py instead.

UserAB1236872 commented 9 years ago

It got through 3 iterations, and then dead froze my computer (had to hard reboot). I'd say maybe it's my machine/memory, but I'm observing the exact same C order problem on my mac. And it's unlikely I have bad RAM or something on both machines.

UserAB1236872 commented 9 years ago

That's with the copy, haven't tried swapping in the python version because of the freeze.

mattjj commented 9 years ago

Well that's definitely more evidence for out-of-bounds writing or something of that flavor... Unfortunately I can't debug it very well since I can't reproduce the problem (unless you want to give me ssh access somewhere). Are you comfortable with running valgrind?

Alternatively, we could hope the problem is to do with concentration resampling, and perhaps you could avoid concentration resampling (or use the Python version of the code). Do you have any problems with examples/hmm.py or examples/hsmm.py?

UserAB1236872 commented 9 years ago

I'm fine with valgrind. HSMM.py works fine. Like I said earlier, it appears to be attached to using alpha_a_0 and such instead of alpha and gamma on the posterior model construction.

For the Python implementation:

Traceback (most recent call last): File "concentration-resampling.py", line 45, in posteriormodel.resample_model() File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 201, in resample_model self.resample_parameters(joblib_jobs=obs_jobs) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 708, in resample_parameters super(_HSMMGibbsSampling,self).resample_parameters(*_kwargs) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 207, in resample_parameters self.resample_trans_distn() File "/usr/local/lib/python2.7/dist-packages/pyhsmm/models.py", line 219, in resample_trans_distn self.trans_distn.resample([s.stateseq for s in self.states_list]) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/internals/transitions.py", line 374, in resample self._resample_beta(ms) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/internals/transitions.py", line 382, in _resample_beta self.beta_obj.resample(ms) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 2059, in resample self.alpha_0_obj.resample(counts) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 3225, in resample return super(GammaCompoundDirichlet,self).resample(data,niter=niter) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 3141, in resample a_n, b_n = self._posterior_hypparams(_self._get_statistics(data)) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/distributions.py", line 3240, in _get_statistics m = sample_crp_tablecounts(self.concentration,counts,weighted_cols) File "/usr/local/lib/python2.7/dist-packages/pyhsmm/basic/pybasicbayes/util/stats.py", line 215, in sample_crp_tablecounts m[i,j] += randseq[starts[i,j]+k] \ NameError: global name 'starts' is not defined

mattjj commented 9 years ago

Pushed a fix for that python implementation error just now (46618f4 in pybasicbayes).

UserAB1236872 commented 9 years ago

Here's the valgrind (it was a bit too big to put here):

http://pastebin.com/YHe98CMU

I'm not exactly experienced with valgrind on python (I've only used it with native C), but it seems like the python GC may be trying to interact with cstats for some reason? I'm basing it off this part just before the crash:

==3345== Invalid read of size 4 ==3345== at 0x56DBCA: PyObject_GC_Del (in /usr/bin/python2.7) ==3345== by 0x1A5CBC16: pyx_tp_dealloc_memoryview (cstats.c:21724) ==3345== by 0x1A5D2675: pyx_tp_deallocmemoryviewslice (cstats.c:21959) ==3345== by 0x5A8449: ??? (in /usr/bin/python2.7) ==3345== by 0x68A495B: ??? (in /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so) ==3345== by 0x511942: ??? (in /usr/bin/python2.7) ==3345== by 0x561124: PyEval_EvalFrameEx (in /usr/bin/python2.7) ==3345== by 0x54B7D3: PyEval_EvalCodeEx (in /usr/bin/python2.7) ==3345== by 0x5626F5: PyEval_EvalFrameEx (in /usr/bin/python2.7) ==3345== by 0x54B7D3: PyEval_EvalCodeEx (in /usr/bin/python2.7) ==3345== by 0x5626F5: PyEval_EvalFrameEx (in /usr/bin/python2.7) ==3345== by 0x54B7D3: PyEval_EvalCodeEx (in /usr/bin/python2.7) ==3345== Address 0x17d54020 is 256 bytes inside a block of size 1,600 free'd ==3345== at 0x4C2B60C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==3345== by 0x686AFF5: ??? (in /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so) ==3345== by 0x68A48F9: ??? (in /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so) ==3345== by 0x1A5D2665: pyx_tp_dealloc__memoryviewslice (cstats.c:21957) ==3345== by 0x5A8449: ??? (in /usr/bin/python2.7) ==3345== by 0x68A495B: ??? (in /usr/lib/python2.7/dist-packages/numpy/core/multiarray.so) ==3345== by 0x511942: ??? (in /usr/bin/python2.7) ==3345== by 0x561124: PyEval_EvalFrameEx (in /usr/bin/python2.7) ==3345== by 0x54B7D3: PyEval_EvalCodeEx (in /usr/bin/python2.7) ==3345== by 0x5626F5: PyEval_EvalFrameEx (in /usr/bin/python2.7) ==3345== by 0x54B7D3: PyEval_EvalCodeEx (in /usr/bin/python2.7) ==3345== by 0x5626F5: PyEval_EvalFrameEx (in /usr/bin/python2.7) ==3345==

UserAB1236872 commented 9 years ago

Got it. I think it was a numpy bug, on a whim I decided to do pip install numpy --upgrade. I installed numpy incredibly recently, so it must have been a new issue that got a recent patch or something.

FWIW, the python version still has a problem with the dtype of one of the arrays, but I don't have the dump anymore.

mattjj commented 9 years ago

Oh great! Yeah it was looking like a numpy bug but it's usually safe to assume the bugs are in my code and not theirs.

I didn't see a dtype problem with running the python sample_crp_tablecounts. Unless you have a hint, I'll just leave it for the next person to run into.

UserAB1236872 commented 9 years ago

Huh, the dtype error is gone. It looks like it was related to the numpy problem.