janetzki / fact_extraction

Fact Extraction from Text
6 stars 0 forks source link

Parse escaped HTML characters in Wikipedia Dump properly #69

Open janetzki opened 7 years ago

janetzki commented 7 years ago

E.g., AT&T Mobility is AT&T Mobility in the dump and therefore also in the character index, what leads to problems.