nikitakit / self-attentive-parser

High-accuracy NLP parser with models for 11 languages.
https://parser.kitaev.io/
MIT License
861 stars 153 forks source link

Tagset #30

Open bustrofedico opened 5 years ago

bustrofedico commented 5 years ago

Hi, what is the full tagset used by the parser? Thanks!

nikitakit commented 5 years ago

We use the Penn Treebank to train -- here is one page that explains the labels used: http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html

You can also access parser._label_vocab and parser._tag_vocab (for the newer models only); these are private but you can print them to get a sense of what the full set of labels is.

bustrofedico commented 5 years ago

Thanks Nikita! Exactly what I needed.

Can I assume the "Treebank-3" https://catalog.ldc.upenn.edu/LDC99T42 was used in training?

nikitakit commented 5 years ago

That's correct