Closed glouppe closed 9 years ago
@etzemis You may want to have a look at this, to get inspiration for implementing "soft" similarities.
This is ready for reviews! Basically these classes have been extracted from the prototype in the notebook.
(Then the next step will be to reuse all of this to illustrate an advanced use case of author disambiguation (i.e., plugging distance learning from transformed paired data + block clustering)).
CC: @MSusik @etzemis @natsheh
This PR now also includes utils.normalize_personal_name
. This latter function might need some more work to make it more robust (along with asciify
), but this can be done later in a separate PR.
:+1: from me.
Thanks for the review @MSusik ! Was is it fine for your as well @etzemis ?
Note, as you can imagine lots of work went through the years into normalizing names within Invenio/INSPIRE, have you also considered the existing algorithms as source of inspirations? I am privately sharing a Google doc with the analysis done so far, in case it can be useful.
@kaplun Thanks! It's definitely worth investigating.
This is the file which contains most relevant work: https://github.com/inspirehep/invenio/blob/prod/modules/bibauthorid/lib/bibauthorid_name_utils.py
Note, as you can imagine lots of work went through the years into normalizing names within Invenio/INSPIRE, have you also considered the existing algorithms as source of inspirations? I am privately sharing a Google doc with the analysis done so far, in case it can be useful.
Thanks, this might be helpful indeed! Let us continue this discussion on #20 however.
:+1: from me too.
This PR implements transformers for paired data.