vidraj / derinet

The main repository for the DeriNet project and all its dependencies and related tools.
https://ufal.mff.cuni.cz/derinet
7 stars 3 forks source link

Log sources and history of derivations and lexemes #3

Open vidraj opened 6 years ago

vidraj commented 6 years ago

The old Perl API logged derivation history and remembered which module and which annotation file was responsible for each derivational link. This is very useful when debugging – if you see an error in the database, you can quickly trace it to its source by looking at the history.

It is also theoretically useful for identifying homonyms – we can use the derivation history to find lexemes that have multilple legitimate parents, without complicating the API by relaxing the 'single parent' restriction. These can be dual paths or homonyms.

The Python API lacks this capability so far, but it should be added.

Ideally, we would track the following for each link:

and remember the whole history of these informations.

Lexeme and module information can be obtained automatically, because the pipeline manager knows the linked lexeme and which module is currently running. The file and line information probably has to be provided by the modules.

The Perl API also remembered lexeme creators in the same way. Lexemes can come from MorfFlex, extra annotations or specific modules and we want to distinguish these sources.

vidraj commented 3 months ago

History of relations is tracked as of commit 48eeb3879388672ae70f8d8114bd30e951cb7e06. All additions are tracked, as well as deletions that use the new convenience methods in Lexicon: remove_relation(), remove_all_parent_relations(), remove_all_child_relations() and remove_all_relations().

Adding and removing lexemes is not tracked yet. It's not clear how to track removing lexemes, and adding them is hopefully rare enough that tracking is not necessary. Having lexeme provenience from the start would be nicer to have, as the start of the pipeline sources data from multiple origins, but also more difficult to implement.

It's not impossible, though, since we already have a direct line from the Perl API to the new Python API to import tag masks. It would be possible to also import the history from the Perl API, including lexeme history.

The correct place to do that would be version 2.0, reading from /tools/data-api/perl-derivmorpho/derinet-1-3b.tsv.