nikitakit / self-attentive-parser

High-accuracy NLP parser with models for 11 languages.
https://parser.kitaev.io/
MIT License
861 stars 153 forks source link

Comparison to en_core_web_sm/md/lg? #26

Closed dav-ell closed 5 years ago

dav-ell commented 5 years ago

spaCy has several models that are capable of dependency parsing in English: en_core_web_sm, en_core_web_md, en_core_web_lg (https://spacy.io/models/en). There's a pretty good demo available of them through displacy. Are there any performance comparisons for dependency parsing with benepar vs these?

nikitakit commented 5 years ago

spaCy does dependency parsing, but benepar does constituency parsing. There aren't any performance comparison because these aren't the same paradigm.

dav-ell commented 5 years ago

@nikitakit Thanks for the clarification. Doing some more reading... It's supposedly possible to convert constituency parses to dependency parses. Is there a comparison after conversion? (It may be that Benepar performs better than spaCy models, even after conversion.)

nikitakit commented 5 years ago

I started looking into constituency-to-dependency conversion at some point in the past, but I found that the process wasn't really documented anywhere. The conversion software itself seems to be part of Stanford CoreNLP so it's easy to download, but it accepts a fair number of flags and there have been many versions of CoreNLP over the years. I don't actively work on dependency parsing, so I also don't know what the standard evaluation data splits are, or what code I should run to compute LAS/UAS numbers that are comparable to published work.

If you'd like to look into this do try it out and let me know what you find! I suspect that benepar will perform quite competitively; it's just a matter of tracking down the right combination of software, data, and command line flags needed to do the conversion.