marcocor / bat-framework

A framework to compare entity linking systems.
GNU General Public License v3.0
37 stars 10 forks source link

D2W as Sa2W Problem Reduction #3

Closed bernhardschaefer closed 10 years ago

bernhardschaefer commented 10 years ago

I noticed that the current implementation for the D2W task lets the annotators solve Sa2W and then use a problem reduction approach where you only keep annotations that overlap with a mention.

Since this assumes that the annotator was able to spot all mentions, I reimplemented this task for the SpotlightAnnotator. My approach is to generate a xml document containing the text and mentions and using Spotlight's disambiguate service (https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Web-service#disambiguate) for retrieving scored annotations. In my experiments, this increased the accuracy of Spotlight by 15-30 percent points for each dataset.

If you are interested in this topic let me know, then I'll clean up my code and send a pull request.

marcocor commented 10 years ago

If I got it right: the idea is that all entity annotators that natively solve D2W should implement that in the method solveD2W. For the others, some kind of wrapper is provided, and they should call ProblemReduction.Sa2WToD2W(). It is absolutely ok, for annotators that natively solve D2W, to replace this call with a call to the actual native D2W implementation. I'm looking forward to your patch ;)

bernhardschaefer commented 10 years ago

I didn't know how to attach a pull request to an existing issue so I created a new one.

Basically, I implemented native D2W for Spotlight. However, to test it I realized that I had to implement various D2W related things such as D2WCache. This is why the pull request got quite big. Feel free to skip all the changes you consider unnecessary.

Also, I added the possibility to switch between Spotlight Disambiguation Algorithms. I've been using this feature for quite a while and, frankly, I was too lazy to separate both features so I just added it to the commit. Next time I'll try to be more focused on single features as commits. ;)

bernhardschaefer commented 10 years ago

OT: Did you already think about introducing a category such as Scored Disambiguate to Wikipedia (Sd2W)? This is what I implemented in my first prototype, since native D2W solvers with scored annotations benefit a lot from this category in comparison to traditional D2W.

marcocor commented 10 years ago

Yep, there are a lot of combinations of problems. Sd2W totally makes sense, though it has not been implemented since no annotators (so far) attack this problem. For the measures, I'd follow an approach similar to that of Sa2W.