ispras / lingvodoc-react

Apache License 2.0
7 stars 11 forks source link

Make possible to edit corpora and parser results for valency statistics #1128

Open vmonakhov opened 2 weeks ago

vmonakhov commented 2 weeks ago

/adverb tool is used to have valency statistics for adverbs. Related parser results can be previously added/edited/deleted . Parser results can be edited on changing source text and/or changing words annotations. So our report should be carefully refreshed on this.

We should scan for updates and waste/duplicate parser results, sentences, instances. Waste parser results appear in database after removing from corpus or on some other reason.

As said before, parser results can change on related text changes. We should compare sentences for report with ones from database, add/delete from db if required. Sentences can change order, can become shorter or longer, words within them can change order as well, whitespaces and punctuation marks can change too.

Main request is to reuse existent sentences/instances and just update them on some changes.

Corpus for testing: Uralic › Finno-Permic › Permian › Udmurt › Corpus of Udmurt texts часть 1 › Texts

vmonakhov commented 2 weeks ago

Resolved. Main points:

vmonakhov commented 1 week ago

1) Seems like we have to delete linked annotations if we reuse not related instances 2) Not possible to update verb valency data because we have to control valency_merge_data_pkey == perspective_client_id, perspective_object_id, verb_lex originality