sfu-natlang / lensingwikipedia

Lensing Wikipedia is an interface to visually browse through human history as represented in Wikipedia. This the source code that runs the website:
http://lensingwikipedia.cs.sfu.ca
Other
11 stars 4 forks source link

antioch identified as a verb #124

Open anoopsarkar opened 9 years ago

anoopsarkar commented 9 years ago

The data prep seems to think antioch is a verb!

  634 CE: antioch
  Heraclius, who is in Emesa, flees to Antioch upon hearing news of the battle's outcome
msiahbani commented 9 years ago

This error is created by SRL. There has been some other errors like this which I manually remove them from the list of events. I do not know any quick solution to fix these kind of errors except than keeping a list of them and remove them from the final list of events. any other suggestions?

anoopsarkar commented 9 years ago

Ideally we should add a few difficult cases of predicate identification from Wikipedia into the training data for the predicate identification classifier. It should then do a better job with such cases.

Also, can we add a filter that does not allow the predicate to start with an upper case character?

msiahbani commented 9 years ago

I am not sure if such a filter is a good solution. There are around 1-2k events with a predicate starting with an upper case character (e.g. attacking, became, defeating, changing, decline). based on my observation, around 10% of them are not verbs. We might use NER to identify them.

Or we can create a cheat-sheet (of these difficult cases) and gradually complete it.

anoopsarkar commented 9 years ago

OK. so using case information is a bad idea. But we can add in some annotated data that covers the difficult cases of predicate id and NER and add it into the training process if that is an option.