Comparison to en_core_web_sm/md/lg?

dav-ell commented 5 years ago

spaCy has several models that are capable of dependency parsing in English: en_core_web_sm, en_core_web_md, en_core_web_lg (https://spacy.io/models/en). There's a pretty good demo available of them through displacy. Are there any performance comparisons for dependency parsing with benepar vs these?

nikitakit commented 5 years ago

spaCy does dependency parsing, but benepar does constituency parsing. There aren't any performance comparison because these aren't the same paradigm.

dav-ell commented 5 years ago

@nikitakit Thanks for the clarification. Doing some more reading... It's supposedly possible to convert constituency parses to dependency parses. Is there a comparison after conversion? (It may be that Benepar performs better than spaCy models, even after conversion.)

nikitakit commented 5 years ago

I started looking into constituency-to-dependency conversion at some point in the past, but I found that the process wasn't really documented anywhere. The conversion software itself seems to be part of Stanford CoreNLP so it's easy to download, but it accepts a fair number of flags and there have been many versions of CoreNLP over the years. I don't actively work on dependency parsing, so I also don't know what the standard evaluation data splits are, or what code I should run to compute LAS/UAS numbers that are comparable to published work.

If you'd like to look into this do try it out and let me know what you find! I suspect that benepar will perform quite competitively; it's just a matter of tracking down the right combination of software, data, and command line flags needed to do the conversion.

nikitakit / self-attentive-parser

Comparison to en_core_web_sm/md/lg? #26