jgstew / pre-commit-jgstew

custom pre-commit hooks
MIT License
0 stars 0 forks source link

add hook to automatically switch unicode characters to nearest equivalent. #3

Open jgstew opened 1 year ago

jgstew commented 1 year ago

use unidecode python module to switch non ascii characters within UTF8 to nearest ascii equivalent.

            with open(this_path) as f:
                file_contents = f.read()

            if not file_contents.isascii():
                print(
                    f"Invalid: {dirpath} - {filename} contained non-ascii chars found by Python"
                )
                with open(this_path, "wt", encoding="utf-8") as this_file:
                    this_file.write(unidecode(file_contents))
jgstew commented 1 year ago

it seems that anyascii has a more permissive license than unidecode: https://github.com/anyascii/anyascii

See licensing issue here: https://github.com/avian2/unidecode/issues/88#issuecomment-1629254113