Open chiarcos opened 1 week ago
Thank you for the submission. Your request addresses several issues.
First, the dependency visualizer does not work, because graphml export sets the node key wrong. This is being fixed by #252
About supporting the data you provided and/or other versions of CoNLL: We suggest to stick to the notation using =
as key-value delimiter for sentence annotations, since this seems easy to replace. We will nevertheless extend the conll module to import annotations that do not start with key =
as bare values that will be added as a sentence annotation conll::comment
holding said value. See #257 for more details. In case of your data this would lead to annotations conll::comment="text: ..."
for each sentence.
Are there any other features of CoNLL-X that you consider necessary?
Thank you, #257 is the best way to deal with that IMHO.
As for other features of CoNLL-X, the last two columns have different functions (cf. https://aclanthology.org/W06-2920.pdf). I guess it's not worth supporting that because they were not widely used, in the first place and this pertains to legacy data, only, which does not seem to be publicly available anymore (at least not from https://ilk.uvt.nl/conll/post_task_data.html). It is still used by some older parsers, though, and sometimes required as input for downstream tasks. So, while I would not advise to go for full CoNLL-X support, I would suggest to be robust against CoNLL-X input, i.e., check whether CoNLL-X data with PHEAD
(9th column) set to an integer would break the CoNLL-U conversion, because CoNLL-U expects pairs of IDs and dependency labels, there, and only these.
You can synthesize such data from CoNLL-U data by just copying the values from the HEAD
column into the 9th column, and the values from the DEP
column into the 10th column.
tests/data/import/conll
text
) is assigned a value (separated by=
). In CoNLL-U v2, there are two obligatory metadata fields,text
andsent_id
, in CoNLL-U v1, metadata is optional, in CoNLL-X, metadata is treated as comment. In the following data snippet, an invalid separator is used, causing the ANNIS visualizer to break (p.c. by Thomas Krause). Apparently, this is because the converter tried to quietly recover the invalid metadata.example