proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

processing instruction problem? #55

Closed JessedeDoes closed 7 months ago

JessedeDoes commented 1 year ago

FoLiA-tools v2.5.4, using FoLiA v2.5.1 with library FoLiApy v2.5.8

gcnd.test.folia.xml.txt

This file validates when I omit the processing instructions (<?n_elan_annotations 2?> etc), but with the processing instructions:

VALIDATION ERROR on full parse by library (stage 2/3), in data/GCND/gcnd.test.folia.xml.txt
AttributeError: 'cython_function_or_method' object has no attribute 'startswith'

(Of course, I can use a comment or something else for this type of information, so it is not a showstopper.)

kosloot commented 1 year ago

The only processing instruction that we implemented in FoLiA is the xml-stylesheet one. All others are ignored, and apparently not correctly handled in the validation step.

It might be a good idea to incorporate a processing instruction node in FoLiA. Seems rather straightforward to implement.

proycon commented 1 year ago

Thanks for the report Looks like a bug indeed, I'll have to pinpoint it. It should indeed be simply ignored by the validator.

kosloot commented 1 year ago

I did some reading in the lxml/etree documentation, and it seems that the Python parsers DISCARD all Processing Instructions in the input file:

Note that XMLParser skips over processing instructions in the input instead of creating comment objects for them. An ElementTree will only contain processing instruction nodes if they have been inserted into to the tree using one of the Element methods.<

The same yields for XML Comments. Quite a bummer.

Implementation of PI's in libfolia is a nobrainer.

kosloot commented 12 months ago

I implemented support for PI's in libfolia. Just to be complete. This might only be limited useful, but ok.

proycon commented 11 months ago

I implemented a fix (simply ignoring processing instructions) in foliapy, pending release still

proycon commented 7 months ago

(fixed in FoLiA-tools v2.5.5)