stanfordnlp / stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
https://stanfordnlp.github.io/stanza/
Other
7.14k stars 880 forks source link

Revamp the constituency parser ensemble #1387

Closed AngledLuffa closed 2 months ago

AngledLuffa commented 2 months ago

Various refactorings of the ensemble to enable it to be treated as a regular constituency parser model in the Pipeline and elsewhere, plus a learnable weight matrix which does some minor reweighting of the ensemble models.

More work needed to figure out the best way to merge the ensemble models' predictions - a current limitation is that the trained models tend to be very accurate on the training data, so there needs to be some way to actually give the model errors to learn from