Open jonorthwash opened 1 year ago
Note from @ftyers, @mr-martian, and @TinoDidriksen: Enhanced dependencies are possible in CG3 using relations.
@ftyers prefers 2 or 3. I suggest 3 as the end goal, but maybe going with 2 as an easier short-term goal / a stop-gap for now.
Currently there are some issues related to converting between formats.
One problem with formats is that converting between them is always lossy. Even between CoNLL-U and CG3, quite a bit is lost. For example, only CoNLL-U supports enhanced dependencies and a difference between X/UPOSTAGS, and CG3 and CoNLL-U handle subtokens differently (and store different information about them, I think?).
So if the user would like to edit the corpus in a different format, and we try to preserve some of the information not native to that format in an underlying format, then when they modify the number or position of tokens, or modify information related to non-visible information, then things could easily get lost, or at least lost track of.
We have a few options for how to deal with this:
What is preferred? Other ideas?