hodea / hodea-review-minder

Scripts to assist code reviews
MIT License
1 stars 1 forks source link

ISO-8559 encoded file crashes review minder #15

Open franzhollerer opened 6 years ago

franzhollerer commented 6 years ago

I have a file which is ISO-8859 encoded. Unfortunately, this crashes the review minder. Here is the output.

$ python3 ~/work/hodea-review-minder/reviewminder.py 
top-dir:  .
*****DEBUG:cfg name
['insert project name here']
*****
*****DEBUG:cfg type
['.c', '.h', '.cpp']
*****
*****DEBUG:cfg exclude
['./review_minder/']
*****
read config:     OK
read database:   OK
*****DEBUG:
{'minder_items': []}
*****
./b5start/src/b5wrpmpc.c
Traceback (most recent call last):
  File "/home/hof/work/hodea-review-minder/reviewminder.py", line 357, in <module>
    main()
  File "/home/hof/work/hodea-review-minder/reviewminder.py", line 349, in main
    minder.rm_search()
  File "/home/hof/work/hodea-review-minder/reviewminder.py", line 277, in rm_search
    for line in flog:           #add write new file here + add hash before writing new file
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 498: invalid continuation byte

The file command show the following information for the concerned file:

$ file ./b5start/src/b5wrpmpc.c
./b5start/src/b5wrpmpc.c: C source, ISO-8859 text
franzhollerer commented 6 years ago

This happend under Linux. A colleague noticed similar problems under Windows when parsing utf-8 encoded files, which work fine under Linux.

It seems that the Python functions try to use the native encoding of the host, which differ between Windows and Linux.