pyexcel-webwares / django-excel

A Django middleware to read, manipulate and write data in different excel formats: csv, ods, xls, xlsx and xlsm.
http://django.pyexcel.org
Other
349 stars 89 forks source link

Unable to support XLS and XLSX uploads in Python 3.9 #66

Open craiga opened 3 years ago

craiga commented 3 years ago

On Python 3.9 with pyexcel-xls and pyexcel-xlsx installed, I'm not able to upload .xslx files.

AttributeError: 'ElementTree' object has no attribute 'getiterator' ``` Saving workbook from spreadsheet.xlsx Internal Server Error: /portfolios/my-portfolio/upload Traceback (most recent call last): File "/app/.heroku/python/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/app/.heroku/python/lib/python3.9/site-packages/django/core/handlers/base.py", line 179, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/app/.heroku/python/lib/python3.9/site-packages/sentry_sdk/integrations/django/views.py", line 67, in sentry_wrapped_callback return callback(request, *args, **kwargs) File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/base.py", line 70, in view return self.dispatch(request, *args, **kwargs) File "/app/.heroku/python/lib/python3.9/site-packages/django/contrib/auth/mixins.py", line 85, in dispatch return super().dispatch(request, *args, **kwargs) File "/app/.heroku/python/lib/python3.9/site-packages/django/contrib/auth/mixins.py", line 52, in dispatch return super().dispatch(request, *args, **kwargs) File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/base.py", line 98, in dispatch return handler(request, *args, **kwargs) File "/app/.heroku/python/lib/python3.9/site-packages/django/views/generic/edit.py", line 142, in post return self.form_valid(form) File "/app/portfolios/views/upload_spreadsheets.py", line 40, in form_valid result = dict(self.save_files(self.request.FILES.getlist("file"))) File "/app/portfolios/views/upload_spreadsheets.py", line 55, in save_files yield (file.name, dict(self.save_book(file.get_book()))) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_webio/__init__.py", line 203, in get_book return pe.get_book(**params) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/core.py", line 47, in get_book book_stream = sources.get_book_stream(**keywords) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/internal/core.py", line 39, in get_book_stream sheets = a_source.get_data() File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/sources/memory_input.py", line 40, in get_data sheets = self.__parser.parse_file_content( File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/parsers/excel.py", line 27, in parse_file_content return self._parse_any( File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel/plugins/parsers/excel.py", line 40, in _parse_any sheets = get_data(anything, file_type=file_type, **keywords) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 86, in get_data data, _ = _get_data( File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 105, in _get_data return load_data(**keywords) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/io.py", line 193, in load_data reader.open_content(file_content, **keywords) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_io/reader.py", line 58, in open_content self.reader = self.reader_class( File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 186, in __init__ super().__init__(file_type, file_contents=file_content, **keywords) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 146, in __init__ self.xls_book = self.get_xls_book(**xlrd_params) File "/app/.heroku/python/lib/python3.9/site-packages/pyexcel_xls/xlsr.py", line 167, in get_xls_book xls_book = xlrd.open_workbook(**xlrd_params) File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/__init__.py", line 130, in open_workbook bk = xlsx.open_workbook_2007_xml( File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/xlsx.py", line 812, in open_workbook_2007_xml x12book.process_stream(zflo, 'Workbook') File "/app/.heroku/python/lib/python3.9/site-packages/xlrd/xlsx.py", line 266, in process_stream for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator(): AttributeError: 'ElementTree' object has no attribute 'getiterator' ```

pyexcel appears to be preferring pyexcel-xls over pyexcel-xlsx for parsing xlsx files.

pyexcel-xls works fine for reading xls files, but the underlying (and unmaintained) xlrd library's XML parsing seems to rely on a method which has been removed from ElementTree. I haven't looked to far into this, but I did see this in what's new in Python 3.9:

Methods getchildren() and getiterator() of classes ElementTree and Element in the ElementTree module have been removed. They were deprecated in Python 3.2. Use iter(x) or list(x) instead of x.getchildren() and x.iter() or list(x.iter()) instead of x.getiterator(). (Contributed by Serhiy Storchaka in bpo-36543.)

I tried solving this issue at https://github.com/pyexcel/pyexcel-io/pull/99 with no luck.

craiga commented 3 years ago

It looks like XLRD has dropped support for XLSX files completely.

chfw commented 3 years ago

Yep, please update to latest pyexcel-xls

craiga commented 3 years ago

Apologies for taking so long to get back to this.

Updating to the latest pyexcel-xls doesn't solve this problem. It's only when we're on the latest version of pyexcel-xls that we see the above error message (if I roll back to the previous version, I get xlrd.biffh.XLRDError: Excel xlsx file; not supported as XLRD is no longer pinned).

As far as I can tell, there are two possible solutions to this issue: