proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

FoLiA to W3C Web Annotations conversion #102

Open proycon opened 2 years ago

proycon commented 2 years ago

Developments regarding annotation within the CLARIAH-PLUS project seem to converge on the use of W3C Web Annotations as a standardized means of sharing decentralized annotations; and more generically, the use of Linked Open Data.

To make transitions possible to these data models, conversion from FoLiA XML to Web Annotations need to be implemented. This also requires that the FoLiA vocabulary itself is formalized better, as per issue #4. Web Annotations will only provide the generic framework, FoLiA-specific vocabulary it still relevant in what is called the 'annotation body' in the web annotation model.

This issue relates to the generic use case described here: https://github.com/CLARIAH/clariah-plus/blob/main/use-cases/cases/folia-lod.md

A tool needs to be implemented that:

Prior work that may be partially reusable for this is the work on Salt interoperability; #85 .

Eventually, the reverse may also be needed, a conversion from Web Annotations to FoLiA, but this is much harder to accomplish, and will only work if the Web Annotations adhere to the vocabulary defined by FoLiA.

proycon commented 1 year ago

This will probably be handled via the STAM tooling now.