inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

similarity: transformers for paired data #19

Closed glouppe closed 9 years ago

glouppe commented 9 years ago

This PR implements transformers for paired data.

glouppe commented 9 years ago

@etzemis You may want to have a look at this, to get inspiration for implementing "soft" similarities.

glouppe commented 9 years ago

This is ready for reviews! Basically these classes have been extracted from the prototype in the notebook.

(Then the next step will be to reuse all of this to illustrate an advanced use case of author disambiguation (i.e., plugging distance learning from transformed paired data + block clustering)).

CC: @MSusik @etzemis @natsheh

glouppe commented 9 years ago

This PR now also includes utils.normalize_personal_name. This latter function might need some more work to make it more robust (along with asciify), but this can be done later in a separate PR.

MSusik commented 9 years ago

:+1: from me.

glouppe commented 9 years ago

Thanks for the review @MSusik ! Was is it fine for your as well @etzemis ?

kaplun commented 9 years ago

Note, as you can imagine lots of work went through the years into normalizing names within Invenio/INSPIRE, have you also considered the existing algorithms as source of inspirations? I am privately sharing a Google doc with the analysis done so far, in case it can be useful.

MSusik commented 9 years ago

@kaplun Thanks! It's definitely worth investigating.

This is the file which contains most relevant work: https://github.com/inspirehep/invenio/blob/prod/modules/bibauthorid/lib/bibauthorid_name_utils.py

glouppe commented 9 years ago

Note, as you can imagine lots of work went through the years into normalizing names within Invenio/INSPIRE, have you also considered the existing algorithms as source of inspirations? I am privately sharing a Google doc with the analysis done so far, in case it can be useful.

Thanks, this might be helpful indeed! Let us continue this discussion on #20 however.

etzemis commented 9 years ago

:+1: from me too.