proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

foliavalidator: switch on -t for all v1.5+ documents #71

Closed kosloot closed 5 years ago

kosloot commented 5 years ago

I was quite surprised by the fact that foliavalidator happily accepted an invalid document I created. Only after a while, I discovered that textvalidation (-t option) is switched off, even for 2.0 documents. I think this is wrong. switching on -t for v1.5+ documents should be the default imho.

proycon commented 5 years ago

That would be a bug indeed, text validation should be on for all v1.5+ document yes!

proycon commented 5 years ago

Reproduced:

 $ foliavalidator inconsistenttext.1.5.0.folia.xml
WARNING: Document (inconsistenttext.1.5.0.folia.xml) uses an older FoLiA version (1.5.0) but is validated with a newer library (2.0.1). If this is a document you created and intend to publish, you may want to upgrade this FoLiA v1 document to FoLiA v2 using the 'foliaupgrade' tool.
Validated successfully: inconsistenttext.1.5.0.folia.xm