Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition

jellAIfish / jellyfish

This repository is inspired by Quinn Liu's repository Walnut.

4 stars 4 forks source link

Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition #43

Closed markroxor closed 6 years ago

markroxor commented 6 years ago

dl.acm.org/citation.cfm?id=1119195

markroxor commented 6 years ago

The challenge of this year’s shared task was to incorporate the unannotated data

markroxor commented 6 years ago

The participants were given access to the corpus af-ter some linguistic preprocessing had been done: for all data, a tokenizer, part-of-speech tagger, and a chunker were applied to the raw data.

markroxor commented 6 years ago

Named entity tagging of English and German training, development, and test data, was done by hand at the University of Antwerp.

markroxor commented 6 years ago

The data contains entities of four types: persons (PER), organizations (ORG), locations (LOC) and miscel-laneous names (MISC).

markroxor commented 6 years ago

The most frequently applied technique in the CoNLL-2003 shared task is the Maximum Entropy Model. Five systems used this statistical learning method. Three systems used Maximum Entropy Models in isolation

Hidden Markov Models were employed by four of the systems

Voted perceptrons were applied to the shared task data and Li, 2003) were applied by one system each.

Transformation-based learning (Florian et al., 2003), Support Vector Machines (Mayfield et al., 2003) and Conditional Random Fields

Five participating groups have applied sys- tem combination.

markroxor commented 6 years ago

Features -

All participants used lexical features (words) except for Whitelaw and Patrick (2003) who imple-mented a character-based method. Most of the systems employed part-of-speech tags.