pyexcel / pyexcel-io

One interface to read and write the data in various excel formats, import the data into and export the data from databases
http://io.pyexcel.org
Other
58 stars 20 forks source link

CSV inside ZIP which is not UTF-8 encoded causes UnicodeDecodeError #74

Closed craiga closed 4 years ago

craiga commented 4 years ago

pyexcel-io is assuming all files within a CSVZ file are UTF-8 encoded.

To demonstrate the issue, this zip file contains one CSV file which is UTF-32 encoded.

Passing it through pyexcel yields the following error:

  …
  File "…/views/upload_spreadsheets.py", line 67, in save_files
    yield (file.name, dict(self.save_book(file.get_book(), share_with_org)))
  File "…/site-packages/pyexcel_webio/__init__.py", line 203, in get_book
    return pe.get_book(**params)
  File "…/site-packages/pyexcel/core.py", line 47, in get_book
    book_stream = sources.get_book_stream(**keywords)
  File "…/site-packages/pyexcel/internal/core.py", line 39, in get_book_stream
    sheets = a_source.get_data()
  File "…/site-packages/pyexcel/plugins/sources/memory_input.py", line 40, in get_data
    sheets = self.__parser.parse_file_content(
  File "…/site-packages/pyexcel/plugins/parsers/excel.py", line 27, in parse_file_content
    return self._parse_any(
  File "…/site-packages/pyexcel/plugins/parsers/excel.py", line 40, in _parse_any
    sheets = get_data(anything, file_type=file_type, **keywords)
  File "…/site-packages/pyexcel_io/io.py", line 72, in get_data
    data, _ = _get_data(
  File "…/site-packages/pyexcel_io/io.py", line 91, in _get_data
    return load_data(**keywords)
  File "…/site-packages/pyexcel_io/io.py", line 216, in load_data
    result = reader.read_all()
  File "…/site-packages/pyexcel_io/book.py", line 157, in read_all
    result[sheet.name] = self.read_sheet(sheet)
  File "…/site-packages/pyexcel_io/readers/csvz.py", line 46, in read_sheet
    sheet = StringIO(content.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte