vinci1it2000 / formulas

Excel formulas interpreter in Python.
https://formulas.readthedocs.io/
European Union Public License 1.1
342 stars 75 forks source link

Range Assembly inefficient #98

Closed owinebar closed 8 months ago

owinebar commented 2 years ago

Is your feature request related to a problem? Please describe. Current approach to range assembly is extremely inefficient on sheets with many lookups or array formulas. The current construction also has to be performed after loading saved formulas from the json format.

Describe the solution you'd like The one that comes to mind is constructing cells in a database, then expressing range assemblies and array formulas as queries. Implementing in SQLite would have the added benefit of providing a "compiled" sheet for later evaluation that wouldn't require repeating the formula compiler and range assembly calculations as overhead every time the model is loaded. The potential downside 8s in the question of how this would interact with or replicate the parallelism of the schedula implementation.

Describe alternatives you've considered None

Additional context

vinci1it2000 commented 2 years ago

Can you specify the version of the formula that you are using?

owinebar commented 2 years ago

1.2.2, the latest at the time of raising the issue

vinci1it2000 commented 2 years ago

Can you share/send the file? Just to understand where is the problem

cetedus commented 1 year ago

Hi,

I have encountered the same issue .

All it takes is for an excel file to have formula with range encompassing entire column e.g. =SUM(B:B)

this will cause, as far as I can tell in formulas/tokens/operand.py these values to be used:

maxcol = 16384
maxrow = 1048576

(in this case maxrow) which will produce an unnecessarily large object in memory (for one column around 500MB , for 2 columns around 1GB etc. ..)

For now I am using a workaround by checking max cols/rows in all sheets of the file and overwriting maxcol/maxrows with those values, but maybe you have a better idea how to properly fix it in your library ?

vinci1it2000 commented 8 months ago

Now the RangeAssembler skip all empty values so its compilation is faster.

Moreover I check the memory consumption when one column is full of empty cells and its memory consumption is only 8MB.

I also updated the handling of EMPTY cells so now the code is faster in the evaluation.