Closed jgrivolla closed 7 years ago
Hello, The Ancora corpus tagset as it is. No modifications.
R
Thanks.
At least with the older models from http://ixa2.si.ehu.es/ixa-pipes/models/pos-resources.tgz#es-pos-maxent-700-c0-b3.bin the tags are generated in upper case, whereas standard Ancora has lower case tags (see the discussion in https://github.com/dkpro/dkpro-core/pull/1071). Could you clarify?
http://clic.ub.edu/corpus/en/ancora-descarregues
Ancora es dep 2.0.0 tags are in upper case. I use the treebank (what you call "standard") only for parsing, for NER and POS and lemmatization I train with the dep-2.0.0 corpus because it is easier to format. For those three tasks the annotations are equivalent, just the syntax is different.
R
I don't have the treebank here right now so I can't check, but I see that in "AnCora: Multilevel Annotated Corpora for Catalan and Spanish" the examples contain both upper- and lower-case tags. Weird.
Just uppercase everything or lowercase it. In the treebank the tags are lowercased. In the dep version uppercase. I do not think it is that important as long as the tagset is the same, which seems to be. Or do you have evidence to say that the tagsets in dep and treebank are different? That could be interesting :)
Hi, for the non-UD models for Spanish are you using the standard Ancora (EAGLES) tagset or are there any modifications?