nikitakit / self-attentive-parser

High-accuracy NLP parser with models for 11 languages.
https://parser.kitaev.io/
MIT License
861 stars 153 forks source link

Buggy output for parentheticals #14

Closed trangham283 closed 5 years ago

trangham283 commented 5 years ago

Hi, I'm using the pretrained benepar model as described in Usage with NLTK. It does not produce (-LRB- -LRB-)/(-RRB- -RRB-) as other standard parsers for cases of parentheticals. For example, parsing this sentence:

Representative George Hansen (R., Idaho) drew a reprimand in nineteen eighty-four after a felony conviction for falsifying his financial disclosures.

gives

(S (NP (NP (JJ Representative) (NNP George) (NNP Hansen)) (PRN (( () (NP (NNP R.)) (, ,) (NP (NNP Idaho)) () )))) (VP (VBD drew) (NP (DT a) (NN reprimand)) (PP (IN in) (NP (JJ nineteen) (JJ eighty-four))) (PP (IN after) (NP (NP (DT a) (NN felony) (NN conviction)) (PP (IN for) (S (VP (VBG falsifying) (NP (PRP$ his) (JJ financial) (NNS disclosures)))))))) (. .))

The empty labels are particularly problematic when used with the trees.py module in this repo. Is this a bug or is this your own label convention?

nikitakit commented 5 years ago

This is fixed in the v0.1.0 release today (at least for English). Thank you for pointing this out!

The issue here was that parentheses were printed un-escaped as (( ( ) instead of (-LRB- -LRB-).