ispras / lingvodoc-react

Apache License 2.0
7 stars 11 forks source link

No valency instances from Apertium Russian parser results #953

Closed myrix closed 1 year ago

myrix commented 1 year ago

On http://lingvodoc.ispras.ru/valency there's no instances from the http://lingvodoc.ispras.ru/dictionary/6236/1709/perspective/6236/1712/view corpus.

The problem is non-standard VBLEX and VBSER verb markers of the Apertium Russian parser, image image, incompatible with Lingvodoc's valency instance code which understands only V.

We should upgrade the code so that VBLEX- and VBSER-marked verbs are also recognized.

myrix commented 1 year ago

Implemented, now we can have instances from this corpus.

To ensure instance generation, it's better to create new entries, re-upload and re-parse documents, and then disambiguate two or three words in each new parser result. Then new instances would be generated after selecting the corpus at http://lingvodoc.ispras.ru/valency and clicking 'Update valency data' button.

We probably should make valency data updating better, in particular, finish #775 and and think about how we should do it in cases like this, when we change the algorithm and should re-parse some data, when data does not change.