proycon / foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
GNU General Public License v3.0
10 stars 4 forks source link

validator rejects folia 2.0 document without a text-annotation declaration #5

Closed kosloot closed 4 years ago

kosloot commented 5 years ago

The validator rejects documents without a text-annotation declaration.

ParseError: FoLiA exception in handling of <t> @ line 50: [DeclarationError] Set 'https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl' is used for TextContent <t>, but has no declaration!

I was under the impression that in such cases a default should be implied, like this:

      <text-annotation set=""https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl"/>

Also this warning is quite confusing, as it mentions TextContent, and not Text. Wouldn't 'textcontent-annotation' have been a better idea, to express this relation? And also 'phoncontent-annotation'...

proycon commented 5 years ago

No, this is not a bug. The default that is implemented is the set only. So if one has the following declaration:

<text-annotation />

Then it is equivalent to:

<text-annotation set=""https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl" />

I'd ideally opt for the verbose form in any serialisation btw..

But if the there is no declaration altogether, then that simply means the document has no text (no <t>). The same goes for phon, these are the only two elements where FoLiA implements a default set.

Also this warning is quite confusing, as it mentions TextContent, and not Text. Wouldn't 'textcontent-annotation' have been a better idea, to express this relation?

The names can be a bit confusing here, I know, but for the end user it's probably clearer this way. TextContent (t) has annotationtype TEXT, the text body/root element is a special has no annotationtype and needs not be declared. Ideally we should have renamed the root element perhaps.

kosloot commented 5 years ago

Yeah lets rename the root element :) Too late for that.

considering

 <text-annotation />

I think this may be dangerous, suggesting users that ANY annotation may be defined in this way, omitting sets, processor etc. I assume this is thoroughly checked to be ONLY applicable for \<phon> and \<t>. Have to check this for de C++ library too.

proycon commented 5 years ago

For all other declarations except for text and phon that are defined that way it explicitly means: we use this annotation type without a set (i.e. no classes anywhere on the annotations).

kosloot commented 5 years ago

Hmm. the I REALLY have to check the C++ library. Although is assume there are already testcases for all this.