m8sec / CrossLinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
GNU General Public License v3.0
1.3k stars 183 forks source link

Remove accents/diacritics #3

Closed nuno-carvalho closed 4 years ago

nuno-carvalho commented 4 years ago

Very nice app.... works like a charm.

For some languages (like Portuguese), it should remove accents/diacritics, cause it's not used in e-mail addresses.

I get it done with function...

def strip_accents(text):
    try:
        text = unicode(text, 'utf-8')
    except NameError: # unicode is a default on python 3 
        pass

    text = unicodedata.normalize('NFD', text)\
           .encode('ascii', 'ignore')\
           .decode("utf-8")

    return str(text)

....called before save to file!

m8sec commented 4 years ago

Hey @nuno-carvalho,

Thanks for the feedback :)

I have actually run into this issue recently myself too, this is great! Ill do some q/a on the code and hope to make a new commit soon! Will leave the issue open until then.

- m8r0wn

m8sec commented 4 years ago

Just made a commit that should correct this issue with the unidecode library.

Thanks, -m8r0wn