inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

metrics: paired metrics are slow #10

Closed glouppe closed 9 years ago

glouppe commented 9 years ago

Paired scoring metrics are quite slow to compute. I think there is room for improvement.

CC: @etzemis

etzemis commented 9 years ago

Ok, I will take care of those.

On Jan 15, 2015, at 1:22 PM, Gilles Louppe notifications@github.com wrote:

Paired scoring metrics are quite slow to compute. I think there is room for improvement.

CC: @etzemis https://github.com/etzemis — Reply to this email directly or view it on GitHub https://github.com/inveniosoftware/beard/issues/10.

glouppe commented 9 years ago

It seems there is a linear algorithm to compute these metrics. See the last slide of http://infolab.stanford.edu/~euijong/vldb10_measure.pdf

glouppe commented 9 years ago

http://eggiweg.ofb.net/thesis.pdf Section 6.6.2

glouppe commented 9 years ago

And here is an implementation in R https://github.com/hussaibi/libclustER/blob/master/trunk/src/measures.R#L277

glouppe commented 9 years ago

http://eggiweg.ofb.net/thesis.pdf Section 6.6.2

This looks very pythonic. It shouldnt be too difficult to actually implement.

glouppe commented 9 years ago

Fixed in #14