Closed ebenaissa closed 7 years ago
First question : no all these specific but common concepts are already enumerated in Wikipedia so they will be catch by the entity disambiguation step, it's out of the scope of NER !
Named entity classes correspond more to particular classes of entities that cannot be enumerated exhaustively in advance.
Also for specialist terminology like biomedical stuff, we typically use specialized NER working on different resources and features... like... grobid-bio ;)
Second question: UNKNOWN
would be for proper names not covered by the other classes, it was introduced a bit as safety net - I can imagine things like name of a god, of a mythic creature, name of conference series maybe, ...
but I agree that it is very similar to CONCEPT
and we could challenge it.
I was wondering if the entity « Horizon 2020 » belongs to the UNKNOWN class since it is a research project. I think also that is between EVENT and ORGANISATION
From wikipedia: "Horizon 2020 is a funding programme created by the European Union/European Commission to support and foster research in the European Research Area (ERA)."
How can we define a funding program?
I think is neither an EVENT
and ORGANISATION
, I would be tempted to annotate it as CONCEPT
or ARTIFACT
.
(I think if we don't find a class then UNKNOWN
would be the most appropriated one)
What's the other think?
I don't think it fits in any class, but for me it's a mix of EVENT, ORGANISATION/INSTITUTION and LEGAL... definitely not CONCEPT nor ARTIFACT :no_mouth:
Other examples come to mind, like Plan Marshall
, or things like research projects (Parthenos
, Parsiti
, names of ANRs, etc.).
For CONCEPT, we could try to eliminate if by saying the rule of final suffix (ism) doesn't apply here like in Communism or Zionism.
CREATION does it apply only for names of movies, songs etc...and only for artistic domain? we could think of CREATION, otherwise?
OK, we need to take a decision. Let's annotate it then as UNKNOWN
.
This example would fit well the original purpose of UNKNOWN
I think (indeed like Plan Marshall
, Parthenos
, ...). It's not CREATION
which is of artistic matter. It's not CONCEPT
because a funding program is not an idea. it's not ARTIFACT
because it supposed some sort of item, an embedding (even for a mental work like a software, it is embedded into an item, e.g. computer embedded invention). It's not an EVENT
(it includes many events, and it is more than that).
We have a few unresolved questions, about the following entities:
1) Final Solution
, Final Solution to the Jewish Question
: the Nazi plan to exterminate the Jews
2) Jewish Question
3) Antisemitism Yellowbadge logo
, Yellow badge
4) Aktion T4 euthanasia programme
, Aktion T4
(a mass murder programme)
UNKNOWN ?
My guess :D
CONCEPT
, this is an ideaCONCEPT
, this is an ideaUNKNOWN
UNKNOWN
it's more than just an ideaI just annotated:
the terms "<ENAMEX type="CONCEPT">war crimes</ENAMEX>" and "<ENAMEX type="CONCEPT">
crimes against humanity</ENAMEX>" were indeed correct labels for what happened.
but I'm doubting, how does it seem?
I would say this are common expressions, not named entities which are hard to enumerate. So I would not annotate war crimes
and crime against humanity
. These concepts will be anyway well catched by the disambiguation part of NERD using Wikipedia common knowledge.
Following the sentence :
I was wondering about patient zero, but also disease and other sequence not really belonging to any other classes but referenced on Wikipedia (the patient zero wikipedia : https://en.wikipedia.org/wiki/Index_case).
Those examples should be annotated as UNKNOWN ? And if we dont annotate those examples at all what sequence can be labeled as UNKNOWN ?