Language and dialect level word form and disambiguation statistics

myrix commented 9 months ago

Users request ability to compile word form and disambiguation statistics for whole languages and dialects, e.g. for Mordovian.

myrix commented 9 months ago

Implemented language-level statistics, added totals over all users, ability to compile statistics over dictionaries and corpora for users separately. That at least should allow compiling word form statistics, disambiguation statistics are pending.

myrix commented 9 months ago

Implemented disambiguation statistics via an additional option.

Due to current architecture of parser result disambiguation can't restrict by time interval, can't compile per-user statistics and compilation is rather slow, up to several tens of seconds for a single perspective depending on the volume of the data, up to minutes and perhaps tens of minutes for dictionaries, dialects and languages.

Maybe it can be improved together with rework of how parser results are stored, processed and handled in general.

ispras / lingvodoc-react

Language and dialect level word form and disambiguation statistics #1030