Closed Stormur closed 3 years ago
Thanks for reporting. I will look at that. It is not related to guillemets (it would be the same with other paired quotes).
Thanks for reporting. I will look at that. It is not related to guillemets (it would be the same with other paired quotes).
Yes, it is definitely not so, I don't know why I deemed it to be so important in the beginning. By the way, I have two other structurally identical sentences where FixPunct fails where there are other punctuation marks.
The problem in the sentences is that we have a (paratactical) clause split in more pieces and interrupted by the non-projective verb of speech (and the annotation is all right).
Sorry it took me so long. I applied the new version on several big treebanks and studied the differences. My original version seemed intuitively "more correct" to me because it usually attached the opening and closing punctuation to the same node, which was indeed the head of the quoted/parenthesized phrase. However, it resulted in non-projectivities, which are forbidden when following strictly the guidelines for punct. So I adapted the code.
Thanks! I tested it again on my conllu files and now it passes all validations!
It seems that the FixPunct block is not able to treat some punctuation marks correctly, in particular the guillemets « and ». Might it be a question of sentence structure? From the following, with all punctuation moved to root and non/projective structure, I get (after this) a sentence with non/projective punctuation at token 10.
After FixPunct:
If the offending guillemet at 10 is attached to token 9, the non-projectivity disappears. It would be an unorthodox attachment, but I suspect the only possible one to avoid this situation.