proycon / foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
https://proycon.github.io/folia
GNU General Public License v3.0
18 stars 5 forks source link

foliavalidator uses "non-standard" quotes in xml header output #20

Closed kosloot closed 4 years ago

kosloot commented 4 years ago

Minor detail, but foliavalidator uses ' and not " in its xml header output, even when the input uses ". This makes comparing results a bit more difficult.

Example: Input:

<?xml version="1.0" encoding="utf-8"?>

Output:

<?xml version='1.0' encoding='utf-8'?>

This is not invalid, but just an annoyance. Also, the "standard" would be to output "UTF-8" instead of "utf-8", that would also be 'nice to have'. otoh: that would make comparing more difficult again. maybe libfolia should add an exception for files where utf-8 is mentioned....

proycon commented 4 years ago

Hmm, I wonder if this is something I do or something the library does for me, but I agree with the suggestion indeed.

proycon commented 4 years ago

Seems to be in lxml? A bit strange it decides to use single quotes...

proycon commented 4 years ago

If I recall correctly this was in lxml (or libxml2 even?) so not something I have control over. Closing the issue.