metrixplusplus / metrixplusplus

Metrix++ is an extendable tool for code metrics collection and analysis.
https://metrixplusplus.github.io
MIT License
75 stars 28 forks source link

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8f in position 1382: ordinal not in range(128) #43

Closed abhay2703 closed 7 months ago

abhay2703 commented 3 years ago

Hello,

I am using python2.7 on CentOS8.2

I am getting the below error:

Traceback (most recent call last): File "/usr/bin/metrix++", line 11, in load_entry_point('metrixpp==1.6.2', 'console_scripts', 'metrix++')() File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/metrixpp.py", line 38, in start exit_code = main() File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/metrixpp.py", line 30, in main exit_code = loader.run(args) File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/mpp/internal/loader.py", line 206, in run exit_code += item.run(args) File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/ext/std/tools/collect.py", line 84, in run return self.reader.run(self, "./") File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/ext/std/tools/collect.py", line 208, in run total_errors = run_recursively(plugin, directory) File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/ext/std/tools/collect.py", line 199, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/ext/std/tools/collect.py", line 147, in run_per_file exit_code += run_recursively(plugin, full_path) File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/ext/std/tools/collect.py", line 199, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/lib/python2.7/site-packages/metrixpp-1.6.2-py2.7.egg/metrixpp/ext/std/tools/collect.py", line 169, in run_per_file checksum = binascii.crc32(text.encode('utf8')) & 0xffffffff # to match python 3 UnicodeDecodeError: 'ascii' codec can't decode byte 0x8f in position 1382: ordinal not in range(128)

The file that is giving error: $ file ./test/test_cc.cpp ./test/test_cc.cpp: C source, Non-ISO extended-ASCII text

avkonst commented 3 years ago

I am afraid not much can be done about it. Is converting the file to utf-8 helps?

abhay2703 commented 3 years ago

Converting the file to 'utf-8' is changing some characters in the file, so I was trying to avoid file conversion.

avkonst commented 3 years ago

Than the only option is to exclude the file from the analysis...

On Wed, 6 Jan 2021, 17:24 Abhay, notifications@github.com wrote:

Converting the file to 'utf-8' is changing some characters in the file, so I was trying to avoid file conversion.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/metrixplusplus/metrixplusplus/issues/43#issuecomment-755069343, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6JSVNI5Y5WASJH3CSGAWLSYPQYRANCNFSM4VUNBP3Q .

abhay2703 commented 3 years ago

I have another question - off topic: Is the tool tested for python3.x ?

EDIT: As this issue is not observed while using python3 on windows

avkonst commented 3 years ago

Yes, this tool is tested with python3 too. I think you should be able to see the github actions executed on commit and release. It has got a number of platforms and python versions tested.

On Wed, Jan 6, 2021 at 5:31 PM Abhay notifications@github.com wrote:

I have another question - off topic: Is the tool tested for python3.x ?

As this issue is not observed while using python3

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/metrixplusplus/metrixplusplus/issues/43#issuecomment-755071026, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6JSVKVXYBF7AH6Q56TXWTSYPRRDANCNFSM4VUNBP3Q .

abhay2703 commented 3 years ago

Now, I am facing this issue with metrix++ installation on python3.6:

Traceback (most recent call last): File "/usr/local/bin/metrix++", line 11, in load_entry_point('metrixpp==1.6.2', 'console_scripts', 'metrix++')() File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/metrixpp.py", line 38, in start exit_code = main() File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/metrixpp.py", line 30, in main exit_code = loader.run(args) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/mpp/internal/loader.py", line 206, in run exit_code += item.run(args) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 84, in run return self.reader.run(self, "./") File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 210, in run total_errors = run_recursively(plugin, directory) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 201, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 147, in run_per_file exit_code += run_recursively(plugin, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 201, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 147, in run_per_file exit_code += run_recursively(plugin, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 201, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 156, in run_per_file text = f.read(); File "/usr/lib64/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 1973: invalid start byte

avkonst commented 3 years ago

Looks like the file is not utf8 .... Not sure what we can do about it

On Thu, 7 Jan 2021, 06:20 Abhay, notifications@github.com wrote:

Now, I am facing this issue with metrix++ installation on python3.6:

Traceback (most recent call last): File "/usr/local/bin/metrix++", line 11, in load_entry_point('metrixpp==1.6.2', 'console_scripts', 'metrix++')() File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/metrixpp.py", line 38, in start exit_code = main() File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/metrixpp.py", line 30, in main exit_code = loader.run(args) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/mpp/internal/loader.py", line 206, in run exit_code += item.run(args) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 84, in run return self.reader.run(self, "./") File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 210, in run total_errors = run_recursively(plugin, directory) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 201, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 147, in run_per_file exit_code += run_recursively(plugin, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 201, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 147, in run_per_file exit_code += run_recursively(plugin, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 201, in run_recursively exit_code += run_per_file(plugin, fname, full_path) File "/usr/local/lib/python3.6/site-packages/metrixpp-1.6.2-py3.6.egg/metrixpp/ext/std/tools/collect.py", line 156, in run_per_file text = f.read(); File "/usr/lib64/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 1973: invalid start byte

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/metrixplusplus/metrixplusplus/issues/43#issuecomment-755439472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6JSVPEIEFKJ5FGUARHCH3SYSLXRANCNFSM4VUNBP3Q .

huornlmj commented 2 years ago

I have this issue too and I wanted to exclude the file causing the issue, but I cannot identify the file name to exclude from the error. Any suggestions as to how?

huornlmj commented 2 years ago

The issue in my case seems to be the © character in the head of a C file. If I remove the © metrix++ works. But this can't be a final solution. For my testing I'm using a file within the popular nmap repository (./libz/contrib/dotzlib/DotZLib/CircularBuffer.cs)

brodym commented 2 years ago

I had a similar problem...I started by deleting comments and unnecessary code from my files which resolved some of these errors, but I finally got annoyed enough to dig deeper. The problem seems to be that the code in collect.py is unsuccessful at obtaining the correct file encoding. Below is modified code that I used to successfully avoid all the unicode related errors I was originally getting when trying to process my source files which were of "Western European (Windows)" encoding:

                        try:
                           text = text.encode(f.encoding)
                        except:
                            pass
                        try:
                            text = text.decode('**windows-1252**')
                        except:
                            pass
                        f.close()
                        checksum = binascii.crc32(text.encode('utf8')) & 0xffffffff # to match python 3

to be clear I changed the decoding attempt which was originally UTF-8 to be windows-1252. It's clear to me this is the code that is causing people fits (and a comment seems to even acknowledge it was causing fits to get unit tests working). I still very much consider my fix to very much be a hack...I suspect the following link has resources that can lead someone to a "real fix": https://stackoverflow.com/questions/11544541/python-ascii-and-unicode-decode-error. In the meantime, perhaps my hack below will help others (note, you should verify your actual file encoding which I did via emeditor).

avkonst commented 2 years ago

ohh, this is getting nowhere taking into account how many encodings exist... is not it easier to convert sources to utf-8?

avkonst commented 2 years ago

Maybe I need to release the latest metrix++ to pip..

On Mon, May 9, 2022 at 10:14 PM Jason Culligan @.***> wrote:

BTW - this only happens to me when using the version installed with pip. It works file if I just clone the metrix++ repo and use the command from there. Same version of Python (Python 3.6.9). Sadly metrix++ does not have a version switch so I cannot tell you which version comes from pip.

— Reply to this email directly, view it on GitHub https://github.com/metrixplusplus/metrixplusplus/issues/43#issuecomment-1120910189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6JSVNGRDLQHHQOCTTQDRDVJDQQLANCNFSM4VUNBP3Q . You are receiving this because you commented.Message ID: @.***>

thom-sch commented 1 year ago

Hi there,

you can find the solution (pull request #56) on https://github.com/thom-sch/metrixplusplus/tree/Issue43_UnicodeDecodeError

Regards from Meerbusch Thomas

prozessorkern commented 7 months ago

Just released a new version on pypi - this includes the mentioned fixes - hope this fixes your issues.