Open DavidNemeskey opened 3 years ago
CoNLL-U comments need to be explicitly enabled with conllu-comments parameter. We may flip the default behaviour to enabled in some future release.
I agree that the documentation is very coarse on this.
Yes, I think it would make sense if that was the default. Should I do it in a PR (+ add a sentence about it to the docs)?
Specifiing this in the docs is ok, but changing the default in xtsv requires new major version at least in xtsv. These breaking changes should be commited in batches to minimise disruption. (We have others in mind.)
@mittelholcz What do you think?
emtsv does not handle CoNLL-U comments very well. If the input is a tsv file, two things happen:
form
column, comments (lines starting with "#
") are treated as a token and are analyzed as a single "word" tokenform anas lemma xpostag
to which I want to addupostag feats
), only the new header is returned.Expected behavior: comments should be kept in the text and returned as-is, and they should not prevent emtsv to analyze the text (as in the second case).