Make possible to edit corpora and parser results for valency statistics

vmonakhov commented 2 weeks ago

/adverb tool is used to have valency statistics for adverbs. Related parser results can be previously added/edited/deleted . Parser results can be edited on changing source text and/or changing words annotations. So our report should be carefully refreshed on this.

We should scan for updates and waste/duplicate parser results, sentences, instances. Waste parser results appear in database after removing from corpus or on some other reason.

As said before, parser results can change on related text changes. We should compare sentences for report with ones from database, add/delete from db if required. Sentences can change order, can become shorter or longer, words within them can change order as well, whitespaces and punctuation marks can change too.

Main request is to reuse existent sentences/instances and just update them on some changes.

Corpus for testing: Uralic › Finno-Permic › Permian › Udmurt › Corpus of Udmurt texts часть 1 › Texts

vmonakhov commented 2 weeks ago

Resolved. Main points:

Sentences can become shorter/longer, change self order and/or words order. Sentence "keeps self" if more than 75% words stay in it all the same.
Sentences and items are updated in-place. New sources/sentences/instances are created, waste ones are removed.
We're looking for duplicate and waste sources/sentences/instances and get rid of them.

vmonakhov commented 1 week ago

1) Seems like we have to delete linked annotations if we reuse not related instances 2) Not possible to update verb valency data because we have to control valency_merge_data_pkey == perspective_client_id, perspective_object_id, verb_lex originality

ispras / lingvodoc-react

Make possible to edit corpora and parser results for valency statistics #1128