inspirehep / beard

Bibliographic Entity Automatic Recognition and Disambiguation
Other
66 stars 36 forks source link

Add nysiis python 3 support #103

Closed michamos closed 5 years ago

michamos commented 5 years ago

Several algorithms can be used for phonetic blocking. The fuzzy library provides all of them, but has a bug in v1.2. (see https://github.com/yougov/fuzzy/issues/14) that prevents soundex from working correctly. Unfortunately, that's the only version compatible with Python 3. Previously, version 1.1 was used on Python 2 and an alternative implementation of double metaphone was used on Python 3, so soundex and NYSIIS were not available. Now we install version 1.1 on Python 2 and 1.2. on Python 3, resulting in NYSIIS being always available also (this happens to be the algorithm giving the best results).

To summarize

Before:

Algorithm Python 2 Python3
soundex :heavy_check_mark: :negative_squared_cross_mark:
NYSIIS :heavy_check_mark: :negative_squared_cross_mark:
double metaphone :heavy_check_mark: :heavy_check_mark:

After:

Algorithm Python 2 Python3
soundex :heavy_check_mark: :negative_squared_cross_mark:
NYSIIS :heavy_check_mark: :heavy_check_mark:
double metaphone :heavy_check_mark: :heavy_check_mark: