thomaslu2000 / Incremental-Parsing-Representations

MIT License
57 stars 4 forks source link

Segmentation fault when evaluating some samples #2

Open jxjessieli opened 2 years ago

jxjessieli commented 2 years ago

Hi, I was trying to reproduce the results following the script you provided. However, I faced segmentation issues when doing the EVALB evaluation. I think the issue has to do with predicted parses, EVALB fails for some of the instances while not for others. Below is one parse predicted by the model after initialization, which causes a segmentation fault when running EVALB in the check_dev() function.

predicted: (TOP (FRAG (SBARQ (UCP (PP (X (VP (X (VP (VB Suffice))) (X (VP (FRAG (SBARQ (PRP it))) (X (VP (X (VP (TO to))) (X (VP (SINV (VB say)) (X (VP (IN that))))))))))) (PRN (S (IN if)))) (FRAG (WHPP (VP (S (VP (DT this)))) (VP (ADVP (VBD were)))))) (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (WHNP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (WHNP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (X (VP (DT a))) (FRAG (SBARQ (NNP New))))) (X (VP (NNP York))))) (FRAG (S (VP (PRN (S (NNP Yankees-Mets))) (X (VP (X (VP (NN series))) (X (VP (, ,)))))))))) (WHNP (WHNP (CC or))))) (NX (QP (CD one))))) (X (VP (IN between))))) (PRN (S (DT the))))) (FRAG (WHNP (NNP Chicago))))) (FRAG (WHPP (NNP Cubs))))) (CONJP (CC and)))) (NP (NP (QP (NNP White)))))) (INTJ (X (VP (NNP Sox))) (X (VP (-LRB- -LRB-)))))) (WHNP (NN hey))) (X (VP (FRAG (SBARQ (, ,))) (X (VP (PRP it))))))) (FRAG (SBARQ (VBZ 's))))) (X (VP (JJ possible))))) (S (VP (ADVP (-RRB- -RRB-)))))) (X (VP (, ,))))) (FRAG (SBARQ (PRP you))))) (FRAG (WHADVP (MD 'd))))) (X (VP (VB need)))) (FRAG (SBARQ (JJ uniformed))))) (FRAG (UCP (NNS police))))) (FRAG (WHPP (IN in))))) (X (VP (DT every))))) (SINV (JJ other)))) (X (NP (NN seat))))) (FRAG (SBAR (TO to))))) (NP (ADVP (VB separate))))) (PRN (S (JJ opposing))))) (X (VP (SBAR (WHNP (X (VP (NNS fans))) (FRAG (S (VP (, ,)))))) (X (VP (PP (CC and)) (WHNP (WHNP (S (VP (WHNP (WHNP (RB only))) (SINV (DT the)))) (X (VP (NN suicidal))))))))))) (FRAG (SBARQ (PRN (S (MD would))) (FRAG (SBARQ (VB bifurcate))))))) (NP (NP (QP (PRP$ their)))))) (S (VP (VP (NNS bonnets)))))) (FRAG (SBARQ (. .)))))))) gold: (TOP (S (VP (VB Suffice) (NP (PRP it)) (S (VP (TO to) (VP (VB say) (SBAR (IN that) (S (SBAR (IN if) (S (NP (DT this)) (VP (VBD were) (NP (NP (DT a) (NNP New) (NNP York) (NNP Yankees-Mets) (NN series)) (, ,) (CC or) (NP (NP (CD one)) (PP (IN between) (NP (DT the) (NX (NX (NNP Chicago) (NNP Cubs)) (CC and) (NX (NNP White) (NNP Sox))))) (PRN (-LRB- -LRB-) (S (INTJ (NN hey)) (, ,) (NP (PRP it)) (VP (VBZ 's) (ADJP (JJ possible)))) (-RRB- -RRB-))))))) (, ,) (S (NP (PRP you)) (VP (MD 'd) (VP (VB need) (NP (JJ uniformed) (NNS police)) (PP (IN in) (NP (DT every) (JJ other) (NN seat))) (S (VP (TO to) (VP (VB separate) (NP (JJ opposing) (NNS fans)))))))) (, ,) (CC and) (S (NP (RB only) (DT the) (NN suicidal)) (VP (MD would) (VP (VB bifurcate) (NP (PRP$ their) (NNS bonnets))))))))))) (. .)))

Appreciate it if you could explain or help solve this issue. Thank you!

thomaslu2000 commented 2 years ago

Occasionally we saw this issue with EVALB early in training, before the parser learned to make decent trees. We stop getting seg faults later on when the predicted trees become more accurate.

Could this be the case here? If not, we'd love to investigate this issue further, especially if you have any other relevant information on it.

Thanks!

xba0 commented 2 years ago

I find that this is because that evalb limits the number of parentheses in a sentence. (at line 62 of evalb.c) You can change that macro definition to solve this problem. BTW. In my dataset, there are some url tokens whose lengths exceed the macro MAX_WORD_LEN, which also causes segmentation fault.