proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

Allow CDATA in <t> elements #12

Closed proycon closed 6 years ago

proycon commented 8 years ago

Not sure if this is already the case.

kosloot commented 8 years ago

For now, this doesn't look to be a good or useful feature. e.g. do we translate the CDATA into text? and what to do on output? What does text() deliver?

Don't implement unless REALLY needed.

proycon commented 8 years ago

Agreed, idea discarded, we won't implement this.

kosloot commented 7 years ago

Why????

proycon commented 7 years ago

It's still something to investigate, as this keeps popping up, and by no means settled. I opened it in response to another user enquiry who expected he could use CDATA. I wonder to what extend strictly disallowing CDATA violates XML specs/conventions.

From W3.org: [Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup.

kosloot commented 7 years ago

Well, Every FoLiA document is valid XML. This does not imply that every XML construct is valid FoLiA !

My biggest concerns are:

So: is there a concrete use case, that cannot be resolved in other ways?

proycon commented 6 years ago

Closing this, idea discarded...