proycon / folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
http://proycon.github.io/folia/
GNU General Public License v3.0
60 stars 10 forks source link

Coreference set is missing #48

Open asharkinasuit opened 6 years ago

asharkinasuit commented 6 years ago

It seems sets were originally hosted at ILK (ilk.uvt.nl), but that server now redirects to this repository: http://ilk.uvt.nl/folia/sets/ --> https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/. However, the coref set (http://ilk.uvt.nl/folia/sets/coref) that is referred to in the official documentation (PDF) is not in the repo here. Is there any chance it can still be retrieved from somewhere? (I tried the Internet Archive but of course they didn't have it, and Google doesn't come up with much either...)

proycon commented 6 years ago

Most sets in the official documentation are fictitious (i.e. they don't exist), should be clearly marked in the documentation too. I don't think anybody ever made a proper co-reference set definition yet. You can get usually away with non-existing sets (though it's of course always better to have them) unless you need full validation or tools who rely on the sets definitions (like FLAT).

(That documentation PDF you linked is a bit outdated btw, better check from the FoLiA site or github repo, where it should be up to date)