proycon / foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
https://proycon.github.io/folia
GNU General Public License v3.0
18 stars 5 forks source link

Prevent tripping over comments #27

Closed oktaal closed 8 months ago

oktaal commented 8 months ago

Hi! I've encountered a small bug where a comment directly after the <FoLiA>-tag caused a parsing error.

proycon commented 8 months ago

Nice one, thanks, merged now!

kosloot commented 8 months ago

I think the provided test contains invalid, or al least questionable FoLiA. folialint doesn't accept it:

XML error: Expecting element metadata, got 'text'

imho it is correct to reject?

kosloot commented 8 months ago

Additional note: This example inspired me to extend the handling of XML comment in libfolia a bit. The (fixed) example files will be handled by libfolia, preserving all comment. NOT completely preserving the order though. (which libfolia never did)

proycon commented 8 months ago

I think the provided test contains invalid, or al least questionable FoLiA. folialint doesn't accept it:

XML error: Expecting element metadata, got 'text'

Yeah, you're right, there should be block always afaik.

oktaal commented 8 months ago

Yes the file had this structure and was a bit weird. The document only contained a list of words, so in the end it wasn't really that useful to parse anyway.