vinci1it2000 / formulas

Excel formulas interpreter in Python.
https://formulas.readthedocs.io/
European Union Public License 1.1
358 stars 75 forks source link

Slow Response in Excel Calculate Function with Entire Column Formulas #121

Closed salitaba closed 10 months ago

salitaba commented 1 year ago

When utilizing entire columns like A:A and B:B in a formula, such as in the example you provided, the calculation function experiences slow response times.

Example:

excel_file_name = "formula.xlsx"
xl_model = formulas.ExcelModel().loads(excel_file_name).finish()
result = xl_model.calculate(outputs=["'[formula.xlsx]SHEET'!ZZ1"])

In this case, the ZZ1 formula is =sumproduct(A:A,B:B).

Expected behavior: To enhance the calculation speed, you can modify the formula to operate within a specific range. For instance, changing the formula to =sumproduct(A1:A2,B1:B2) will significantly improve the calculation speed:

Additional context: Upon investigation, it was found that when A:A is used in formulas, the function's arguments contain a list of one million data points, despite having only one row of data. A potential solution involves loading only the necessary amount of data, rather than the entire one million data points.

This optimization can help in situations where using entire column references leads to unnecessary data loading and subsequently slower calculations. By specifying a smaller range that encompasses only the required data, you can significantly enhance the performance of your calculations.

vinci1it2000 commented 10 months ago

I changed the handling of empty values and the sumproduct function. This has improved the overall performance of about 659%.