FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
The FoLiA documentation is currently a LaTeX document containing 157 pages that has grown over the years. Though it has been revised to keep up with the latest FoLiA standard, at certain places discrepancies may have arisen with the yaml specification (folia.yml) that acts as the source for the libraries (pynlpl.formats.folia and libfolia). A more integrative revision of the documentation might be desirable. By this I mean that parts of the documentation are generated from the specification, giving the documentation a more formal character and ensuring everything is in sync.
This also allows for documentation to be publishable in various forms, rather than just the PDF which it is now.
Part of the documentation, such as short descriptions of all elements and attributes, are moved to the folia.yml specification.
The full documentation is being redone in reStructuredText (rst), to be processed by sphinx and hosted on https://readthedocs.io. The rst sources contain foliaspec directives where automated documentation can be pulled from the specification.
The foliaspec tool extracts this from the specification and updates the rst sources.
LaTeX and a PDF will be produced automatically by sphinx from the rst sources.
The part of the documentation that is in the specification needs to propagate to the libraries (foliapy and libfolia), as well as the RelaxNG schema as well, again via foliaspec.
The FoLiA documentation is currently a LaTeX document containing 157 pages that has grown over the years. Though it has been revised to keep up with the latest FoLiA standard, at certain places discrepancies may have arisen with the yaml specification (
folia.yml
) that acts as the source for the libraries (pynlpl.formats.folia
andlibfolia
). A more integrative revision of the documentation might be desirable. By this I mean that parts of the documentation are generated from the specification, giving the documentation a more formal character and ensuring everything is in sync.This also allows for documentation to be publishable in various forms, rather than just the PDF which it is now.