Open amagooda opened 7 years ago
Hi!
I have never seen this particular issue before, but we should be able to get to the bottom of it.
First: this probably has nothing to do with the error, but you should not include the brackets in [--dynet-seed=42]
. In the usage string, the brackets are a convention denoting that the argument is optional. As evidence, the first line of output should not say [dynet] random seed: 3694361057
but [dynet] random seed: 42
if the seed is set correctly. Try removing the brackets.
Second, have you tried other configurations, for instance --model=bare
, --model=full
, or --method=rnn --model=bare
? Do those also trigger the issue?
Finally, to pinpoint what triggers the segfault, could you try turning on the Python debugger by adding the line import pdb; pdb.set_trace()
to the exp_train_test.py
file, and then, when running it, proceeding step-by-step using s
until you encounter the error, and then let me know what line the error occurs at?
The error should come from either ad3
, dynet
, or (very unlikely) pystruct
. Could you also tell me how you installed all these 3 libs?
Thanks!
So, I ran it using --method=rnn --model=bare and it worked.
I tried tracing the code to find the line that triggers the issue. and i think this is the one
Thanks, your analysis is great!
Both signs point to the fact that the AD3 inference is the culprit. In particular, --method=rnn --model=bare
does not use AD3 inference at all, which is why you don't see the error.
At the moment marseille
requires a few changes in the ad3
python wrapper, so the current release from the website you linked does not work. Please uninstall your current version of ad3
and then install the one from my fork here. I am working on making a new release of ad3
more easily available and easier to install. If you are having issues installing the version from my fork, let me know. Thanks!
I installed the AD3 version you sent me, i am still facing the same issue while running the "strict" variant.
Hmm, maybe there are some issues with your AD3 install. Can you try running the AD3 python examples and the python unit tests?
It might be worth trying to install all the dependencies in a fresh, empty virtualenv to make sure that old versions are not accidentally used.
I made sure that i am using the fresh installation of the AD3, then I tried running two examples (example.py & example_grid_diversity.py). I also tried the two test files (test_basic.py & test_pystruct.py)
And everything works just fine.
Yet the error with Marseille is still there?
This is odd. It would be great if you could still try installing everything in a fresh virtualenv. What OS are you using?
Linux, Ubuntu
That is exactly the same as what I am using, so it is probably not about that. Let me know what the results are in a fresh virtualenv.
BTW, what happens if you use cdcp
instead of ukp
(but still with rnn-struct strict
)? How about the linear-struct strict
models?
I still didn't try cdcp, however I tried the linear-struct strict model. It fails too, the output is as follows
[dynet] random seed: 2656436439 [dynet] allocating memory: 512MB [dynet] memory allocation done. 2017-07-27 18:44:48,226 - root - INFO - linear-struct strict on ukp ({'C': 0.03}) 2017-07-27 18:46:24,845 - root - INFO - Setting node class weights Claim: 1.0, MajorClaim: 1.0, Premise: 1.0 2017-07-27 18:46:24,845 - root - INFO - Setting link class weights False: 1.0, True: 4.801313628899836 2017-07-27 18:46:24,845 - root - INFO - Joint feature size: 29033 Iteration 0
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
I just tried making an empty virtualenv and installing all the dependencies from scratch, and I still could not reproduce this problem.
What version of python are you using?
When you stepped through the code via the debugger, did it manage to get through any documents before crashing, or does it crash at the very first call to inference?
In any case I am working on making AD3 a bit safer to naked memory accesses, which might help pinpoint what's going on here. I plan to make a new release soon.
I just released AD3 v2.1 which can be installed with pip install --upgrade ad3
. Would you mind trying again using this release?
I got the issue "Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)" when i am trying to reproduce the same results over the ukp data set.
The problem appears while running exp_train_test.py using the arguments "ukp --method rnn-struct --model strict [--dynet-seed=42]"
The console output is as follows:
[dynet] random seed: 3694361057 [dynet] allocating memory: 512MB [dynet] memory allocation done. 2017-07-18 12:27:07,154 - root - INFO - rnn-struct strict on ukp ({'max_iter': 10, 'mlp_dropout': 0.15}) 2017-07-18 12:27:13,659 - root - INFO - Setting node class weights Claim: 1.0, MajorClaim: 1.0, Premise: 1.0 2017-07-18 12:27:13,660 - root - INFO - Setting link class weights False: 1.0, True: 4.725530458590007 2017-07-18 12:27:13,660 - root - INFO - Overriding n_embeds to glove size 300 2017-07-18 12:27:13,671 - root - INFO - Initializing embeddings... 2017-07-18 12:27:13,799 - root - INFO - ...done
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
Do you know what can be causing this problem ?, and i am using dynet v1.1