sonatype-nexus-community / jake

Check your Python environments for vulnerable Open Source packages with OSS Index or Sonatype Nexus Lifecycle.
https://jake.readthedocs.io/
Apache License 2.0
111 stars 24 forks source link

[BUG] -f option uses wrong encoding (cp1252) on Windows for UTF-8 files #130

Open sanzoghenzo opened 1 year ago

sanzoghenzo commented 1 year ago

Describe the bug

Using jake ddt -f poetry.lock -t POETRY on a windows machine (still on python 3.7 unfortunately) can result in UnicodeDecodeErrors.

In my case the error is thrown because of a unicode character in the description of the mergedeep package.

  File "jake\app.py", line 122, in main
    JakeCmd(args).execute()
  File "ake\app.py", line 96, in execute
    exit_code: int = command.execute(arguments=self._arguments)
  File "jake\command\__init__.py", line 43, in execute
    return self.handle_args()
  File "jake\command\oss.py", line 83, in handle_args
    self.arguments.sbom_input_type, self.arguments.sbom_input_source
  File "jake\command\parser_selector.py", line 26, in get_parser
    input_data = input_data_fh.read()
  File "lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 78405: character maps to <undefined>

To Reproduce Steps to reproduce the behavior:

poetry new jake-test
cd jake-test
poetry add mergedeep
jake ddt -f poetry.lock -t POETRY

Expected behavior poetry.lock should be parsed correctly.

Desktop (please complete the following information):

Additional context This can be solved adding , encoding="utf-8" to the FileType constructor of the -f parameter.

I don't know if this will generate any unwanted side effects; python 2.7 times are gone, but I could be missing something.