Closed ptakopysk closed 7 years ago
No, underscore is allowed for HEAD and has different semantics than 0 -- while underscore denotes unspecified value, 0 denotes a ROOT as the HEAD.
Quoting from the linked format specification:
The fields must additionally meet the following constraints:
- Fields must not be empty.
- Fields other than FORM and LEMMA must not contain space characters.
- Underscore (_) is used to denote unspecified values in all fields except ID. Note that no format-level distinction is made for the rare cases where the FORM or LEMMA is the literal underscore – processing in such cases is application-dependent. Further, in UD treebanks the UPOSTAG, HEAD, and DEPREL columns are not allowed to be left unspecified.
The specification says "Underscore (_) is used to denote unspecified values in all fields except ID." so I think underscore in the HEAD column is ok.
OK, you're right, I haven't seen that part. Thanks for clarification.
Actually, Martin Popel and me are reponsible for putting that part there :-)
When udpipe is run without --parse, it sets the HEAD fields to _, which does not conform to the CONLL-U format spcification -- IMHO it should be set to 0.