Closed ashkulz closed 7 years ago
@@ master #25 diff @@
==========================================
Files 31 31
Lines 2517 2518 +1
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 2492 2493 +1
Misses 25 25
Partials 0 0
Powered by Codecov. Last update 7bcfb4e...f95361b
Your pull request is appreciated. However, I failed to see the logical difference although the file were changed.
At the moment, pyexcel-xlsx use max_column to get the number of columns. Are you suggesting a different function to be used?
In order to support columns computed on the fly, here is an example in ods where _iterate_columns was overriden.
I think the interfaces in Sheet.py could be revised further to make it clearer. Do you have any suggestions?
When pyexcel-xlsx
calls max_column
, it triggers a dynamic computation which can take a lot of time ... in case you have 10000 rows, it gets called 10000 times instead of 1, which is quite wasteful. All the change does is to pre-compute it so that it doesn't have to be done every time.
In case _iterate_columns
is re-implemented in a derived class, then the ncols
parameter can be ignored.
As a comparison, in 0.2.x the commit advarisk/pyexcel-io@eb3bfbc87aee20ac72b39f3971650c4d34608914 feels very clear and logical (which is what I'm actually using in production). I don't think that the API with _iterate_columns
is perfect; but don't really have an alternate suggestion :frowning_face:
I see what you meant now. max_column is computed always hence your change will cache the value which could improve performance.
Let me do a bit research around it.
In case you're open to releasing 0.2.4.1
with the above change, I'll submit a PR for that too ... but there is no 0.2.x
branch, which is what I've done in the fork.
have create branch 0.2.x and will release it while I review all plugins and make necessary changes in 0.3.0 and its plugins.
please evaluate latest pyexcel-xlsx for performance improvement.
This can be costly for certain implementations e.g. pyexcel-xlsx (which internally uses openpyxl) where it is computed on-the-fly.
It isn't possible to make the fix in pyexcel-xlsx as there is the possibility of it changing due to modifications via the API. One does have to make the (rather reasonable) assumption that the number of columns doesn't change during execution of
to_array
.