medialab / halexp

medialab's expert search engine poc
GNU General Public License v3.0
4 stars 0 forks source link

Explorer la possibilité de dédoublonner les auteurs en se limitant au hal_authIdPerson_i #31

Open boogheta opened 5 months ago

boogheta commented 5 months ago

Et n'utiliser les prénoms / noms que dans les cas où le hal_authIdPerson_i est absent

jimenaRL commented 4 months ago

In order to prevent that a same author appears twice in the search results with misspelled or incomplete version of its name (e.g.: Juila Cagé vs Julia Cage or Pedro Ramaciotti Morales vs Pedro Ramaciotti), we compare equality between authors by their numeric halId, and if this is equal to 0, we use a normalized version of its name given by unidecode('-'.join([authorFirstName, authorLastName])).

jimenaRL commented 4 months ago

The installation of python unidecode package was added to the Dockerfile

jimenaRL commented 4 months ago

The branch with the changes was merged to main. I'm not sure if the automatic gitlab deployement is working.