Closed matyaskopp closed 3 years ago
The referenced image describes only the entity types; apart from them, the hierarchy also includes container NEs, described in Chapter 2 of the technical report Ševčíková et al., 2007 referenced in CNEC 1/CNEC 2 description.
The container NEs are:
P
for (complex) person names,T
for temporal expressions,A
for addresses,C
for bibliographic itemsI agree the documentation on the web could mention them directly instead of referencing the report.
Missing container types in the description is confusing. And "super-types" that are distinguished from container with capital letter too...
So if you can please rephrase this sentence: The corpus uses 46 named entity types, which can be nested. (https://ufal.mff.cuni.cz/nametag/2/models) Yes, it means that CNEC contains exactly 46 types that can be nested - it is true. It does not exclude the possibility of other types. But careless reader (me) expects that the sentence covers all possible entities.
I agree it would be better to clarify the model description:
The corpus uses 46 named entity types, which can be nested.
The corpus uses 46 atomic named entity types, which can be embedded, e.g., the river name can be part of a name of a city as in <gu =Ústí nad <gh Labem>>
). There are also 4 so-called contaner NEs: two or more NEs are parts of a container NE (e.g., two NEs, a first name and a surname, form together a person name container NE such as in <PP
for (complex) person names, T
for temporal expressions, A
for addresses, and C
for bibliographic items.
Thanks for your suggestions and sorry for the confusion. I explained the existence of the NE containers in both CNEC and NameTag2 documentations.
For this sentence:
nametag returns unexpected category C - Bibliography container, this category is not defined in https://ufal.mff.cuni.cz/~strakova/cnec2.0/ne-type-hierarchy.pdf