pyexcel / pyexcel-io

One interface to read and write the data in various excel formats, import the data into and export the data from databases
http://io.pyexcel.org
Other
58 stars 20 forks source link

Prefer XLSX plugin when reading XLSX files. #99

Closed craiga closed 3 years ago

craiga commented 3 years ago

Reading xlsx files with xlrd appears to be broken in Python 3.9.

    def process_stream(self, stream, heading=None):
        if self.verbosity >= 2 and heading is not None:
            fprintf(self.logfile, "\n=== %s ===\n", heading)
        self.tree = ET.parse(stream)
        getmethod = self.tag2meth.get
>       for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
E       AttributeError: 'ElementTree' object has no attribute 'getiterator'

Reading xls files still works fine, but xlrd's XML parsing seems to rely on a method which has been removed from ElementTree. I haven't looked to far into this, but I did see this in what's new in Python 3.9:

Methods getchildren() and getiterator() of classes ElementTree and Element in the ElementTree module have been removed. They were deprecated in Python 3.2. Use iter(x) or list(x) instead of x.getchildren() and x.iter() or list(x.iter()) instead of x.getiterator(). (Contributed by Serhiy Storchaka in bpo-36543.)

This means that I can't parse xlsx and xls files under Python 3.9, as pyexcel-xls is preferred over pyexcel-xlsx.

This PR should resolve this, but I don't know what the knock-on effects of this might be.

What do you think?

codecov-io commented 3 years ago

Codecov Report

Merging #99 (099c977) into dev (7adcec9) will increase coverage by 0.07%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev      #99      +/-   ##
==========================================
+ Coverage   97.83%   97.91%   +0.07%     
==========================================
  Files          52       52              
  Lines        3332     3360      +28     
==========================================
+ Hits         3260     3290      +30     
+ Misses         72       70       -2     
Impacted Files Coverage Δ
pyexcel_io/utils.py 100.00% <ø> (ø)
pyexcel_io/database/common.py 100.00% <0.00%> (ø)
pyexcel_io/database/importers/django.py 100.00% <0.00%> (ø)
tests/test_django_book.py 99.67% <0.00%> (+0.01%) :arrow_up:
pyexcel_io/plugins.py 96.39% <0.00%> (+0.90%) :arrow_up:
pyexcel_io/database/importers/sqlalchemy.py 98.27% <0.00%> (+1.97%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7adcec9...099c977. Read the comment docs.

chfw commented 3 years ago

Makes sense.

Please update changelog.yml and tick on new bsd license box.

Thanks for adding 3.9-dev. If you want the change stay, the right place(the strange place) is .moban.d/custom_travis.yml.jj2. Then our tool will render it as .travis.yml

chfw commented 3 years ago

Just a note: this change does not have material change rather the helper message put xlsx before xls. Inside pyexcel, there is no preference mechanism.

chfw commented 3 years ago

@craiga , what's your plan for this PR?

if you want to force pyexcel-io to use pyexcel-xlsx, you can pass on a parameter: "library=..".

craiga commented 3 years ago

@chfw As you mentioned, I've just verified that this change doesn't fix the issue I'm seeing in my app. I've been trying to replicate the problem in a test inside pyexcel-io without luck. I'm going to close this PR and do some more investigation.

chfw commented 3 years ago

Please use ‘library’ option to force it to use pyexcel-xlsx

craiga commented 3 years ago

@chfw That approach works, but unfortunately I can't see a good opportunity to specify a library when the file is uploaded using django_excel. I've logged this as an issue on that project https://github.com/pyexcel-webwares/django-excel/issues/66, and would love to hear your thoughts.