Many links are currently dropped because we don't currently follow the redirect map when available: for instance all links to "China" should redirect to "People's Republic of China" hence the generated NER corpus is lacking statistical clues to detect "China" as an entity of type "Place".
To resolve this, one need to introduce a new COGROUP operation with a conditional GENERATE statement (not all links have a redirect).
Many links are currently dropped because we don't currently follow the redirect map when available: for instance all links to "China" should redirect to "People's Republic of China" hence the generated NER corpus is lacking statistical clues to detect "China" as an entity of type "Place".
To resolve this, one need to introduce a new COGROUP operation with a conditional GENERATE statement (not all links have a redirect).