ispras / lingvodoc-react

Apache License 2.0
7 stars 11 forks source link

Language and dialect level word form and disambiguation statistics #1030

Closed myrix closed 9 months ago

myrix commented 9 months ago

Users request ability to compile word form and disambiguation statistics for whole languages and dialects, e.g. for Mordovian.

myrix commented 9 months ago

Implemented language-level statistics, added totals over all users, ability to compile statistics over dictionaries and corpora for users separately. That at least should allow compiling word form statistics, disambiguation statistics are pending.

image

image

myrix commented 9 months ago

Implemented disambiguation statistics via an additional option.

image

image

Due to current architecture of parser result disambiguation can't restrict by time interval, can't compile per-user statistics and compilation is rather slow, up to several tens of seconds for a single perspective depending on the volume of the data, up to minutes and perhaps tens of minutes for dictionaries, dialects and languages.

Maybe it can be improved together with rework of how parser results are stored, processed and handled in general.