macisamuele / language-formatters-pre-commit-hooks

Collection of custom pre-commit hooks.
Apache License 2.0
115 stars 58 forks source link

pretty-format-ini with cp1250 codepage #172

Open novaklu opened 1 year ago

novaklu commented 1 year ago

Hello I use language-formatters-pre-commit-hooks - pretty-format-ini to check the ini files of an older program. This program is for windows and uses cp1250 encoding. When checking, the following message appears:

Traceback (most recent call last): File "C:\Program Files\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Program Files\Python\Python39.cache\pre-commit\reponx58ayq8\py_env-python3.9\Scripts\pretty-format-ini.EXE__main__.py", line 7, in File "C:\Program Files\Python\Python39.cache\pre-commit\reponx58ayq8\py_env-python3.9\lib\site-packages\language_formatters_pre_co mmit_hooks\pretty_format_ini.py", line 26, in pretty_format_ini string_content = input_file.read() File "C:\Program Files\Python\Python39\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 16: invalid continuation byte

Apparently there is a problem with the code page of the file. When I convert the file to UTF8 the check runs without any problems. Is there any way to define the code page for pretty-format-ini?

I am attaching two files one in utf8 and one for cp1250. Is it possible to possibly work around this?

Thank you in advance for the answer

test-cp1250.txt test-utf8.txt

Delgan commented 11 months ago

Hi @novaklu.

I randomly came across this issue. Since I'm the author of the config_formatter library used internally to format INI files, I was intrigued. However, it turns out the problem lies within language-formatters-pre-commit-hooks because the UTF-8 encoding is hardcoded here: https://github.com/macisamuele/language-formatters-pre-commit-hooks/blob/71b7fe22689d67b535b1728302f1ea306a47f3aa/language_formatters_pre_commit_hooks/pretty_format_ini.py#L25

Is there any way to define the code page for pretty-format-ini?

As of today, I don't think this is possible. Achieving this would entail either introducing a new parameter or implementing a mechanism to detect the encoding of the input file.

Is it possible to possibly work around this?

You could create your own pre-commit hook in the form of a basic script (it's very simple), and either call config_formatter yourself with the desired encoding, or convert your file to utf8 before it's processed by language-formatters-pre-commit-hooks.

macisamuele commented 10 months ago

We can make encoding configurable via CLI argument on the tool, no objections to it. @novaklu feel free to open a PR and I will be happy to review/merge it.

As general suggestion, I'm wondering if it would be worth changing the file encoding to utf-8 as it would make the config well behave across different platforms and individual users locales.

novaklu commented 10 months ago

Hello Samuel, Thank you for pointing out where the source of this behavior is hidden. Unfortunately in my case it is not possible to change the file encoding to UTF8. This is an older application. It runs in, let's say, very specific conditions of technological computers.

I will try to modify the encoding settings in pretty_format_ini.py.