Annotations for composite words are not stored in timarkh_uniparser function

ispras / lingvodoc-react

Apache License 2.0

7 stars 11 forks source link

Closed vmonakhov closed 1 month ago

vmonakhov commented 1 month ago

Here we get annotations for composite words in timarkh_uniparser function: https://github.com/ispras/lingvodoc/blob/faf43c03332934e3b6d8f8062bb81a95b9aad026/lingvodoc/utils/doc_parser.py#L111 Further we find composite words with obtained empty annotations and split such words into simple ones, then try to find annotations for new set once more. So we don't store non-empty annotations for composite words, it seems like a logical mistake.

vmonakhov commented 1 month ago

Actually all the words are processed, but composite words are processed twice. This is not bug, this is just non-optimality.