vidraj / derinet

The main repository for the DeriNet project and all its dependencies and related tools.
https://ufal.mff.cuni.cz/derinet
7 stars 3 forks source link

DeriNet._roots is not updated #6

Open vidraj opened 6 years ago

vidraj commented 6 years ago

During database load, the field DeriNet._roots is initialized to contain a list of internal IDs of all cluster roots. But as far as I can see, it is not updated when adding derivations, so the information goes out of date quickly as clusters are merged or split. The information is used when saving the database, possibly resulting in a broken file.

I didn't actually test any of this, I just noticed it reading the source. If I'm wrong, correct me, please.

vidraj commented 6 years ago

Commit 29743381a47cd2ec218adc67fdf3053a1161579 possibly fixes a part of this issue, that the roots weren't correctly created on DB load and when adding lexemes. An internal ID is now added to ._roots when the new node doesn't have a parent, i.e. is a root.

I wrote tests for both this and the not-updated-when-adding-derivations issue.

vojtsek commented 6 years ago

I followed with commit e3e89bb5ba7410032e1d9aaf89cfb5809257ca0d. It changes _roots representation to set to achieve efficient removal. Changes in the roots that are caused by derivations should be hnadled now. I leave this issue open for now till it will be confirmed.

vidraj commented 6 years ago

The roots should probably be updated in sort() as well.

vojtsek commented 6 years ago

You are right @vidraj, thanks. It should be easy to add, I will also create a test for that.

vidraj commented 6 years ago

Thank you for solving this. :-)