Closed anoopsarkar closed 11 years ago
this is the event in http://en.wikipedia.org/wiki/1962:
The Mona Lisa by Leonardo da Vinci Mona Lisa was assessed for insurance purposes at US$100 million, before the painting toured the United States for several months. It is the highest insurance value for a painting in history. However, the Louvre chose to spend the money that would have been spent on the insurance premium on security instead.
"Leonardo da Vinci Mona Lisa" is recognized a person by NER, although there are two different links for "Leonardo da Vinci" and "Mona Lisa" (found by crawler).
I am still not sure how should I treat such issues, but will work on it.
Seems like missing punctuation. If it is an NER error then let it be. When we scale to the full data these cases should not matter much. Go ahead and close the issue. On May 13, 2013 11:48 PM, "Maryam Siahbani" notifications@github.com wrote:
this is the event in http://en.wikipedia.org/wiki/1962:
The Mona Lisa by Leonardo da Vinci Mona Lisa was assessed for insurance purposes at US$100 million, before the painting toured the United States for several months. It is the highest insurance value for a painting in history. However, the Louvre chose to spend the money that would have been spent on the insurance premium on security instead.
"Leonardo da Vinci Mona Lisa" is recognized a person by NER, although there are two different links for "Leonardo da Vinci" and "Mona Lisa" (found by crawler).
I am still not sure how should I treat such issues, but will work on it.
— Reply to this email directly or view it on GitHubhttps://github.com/sfu-natlang/lensingwikipedia/issues/28#issuecomment-17859508 .
The person names are sometimes incorrect in the json file dumped by the crawler.
e.g. "Leonardo daVinci Mona Lisa" is presented as a single person name, even though the event does not have these words in sequence anywhere. there are separate entries for "Leonardo DaVinci" and "Mona Lisa".
This happens very rarely it seems at least the current subset of the data that is in data-small.