inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

utils: better name normalization (closes #20) #58

Closed MSusik closed 9 years ago

MSusik commented 9 years ago

TO DO:

Signed-off-by: Mateusz Susik mateusz.susik@cern.ch

MSusik commented 9 years ago

First result

528fd1ec491d55cfb9f0f63f4dfc83072fba4cfc

Number of blocks = 13114 True number of clusters 15575 Number of computed clusters 15755 B^3 F-score (overall) = 0.9814024143924055 B^3 F-score (train) = 0.9879790622231259 B^3 F-score (test) = 0.9810051562488993

Second result (with new name normalization) in progress.

natsheh commented 9 years ago

:+1:

MSusik commented 9 years ago

Results after cherry picking msusik/beard@15a0778

I didn't resample the pairs.

Number of blocks = 13114 True number of clusters 15575 Number of computed clusters 15630 B^3 F-score (overall) = 0.9813912291609609 B^3 F-score (train) = 0.9878633340847559 B^3 F-score (test) = 0.981002451130001

MSusik commented 9 years ago

After resampling Number of blocks = 13114 True number of clusters 15575 Number of computed clusters 15649 B^3 F-score (overall) = 0.9815744332694658 B^3 F-score (train) = 0.9879289752654714 B^3 F-score (test) = 0.9811822815653505

So, it should be ready to merge!

natsheh commented 9 years ago

:+1:

glouppe commented 9 years ago

Old and new strategies give the exact same number of blocks, which is unexpected. Can you proofcheck the results?

MSusik commented 9 years ago

OK, I rechecked it. Correct results:

('Number of blocks =', 13112) ('True number of clusters', 15575) ('Number of computed clusters', 16682) ('B^3 F-score (overall) =', 0.9832201238291286)