Open nopria opened 2 months ago
i dont know if this could be related to any of those: https://github.com/vinci1it2000/formulas/issues/119 https://github.com/vinci1it2000/formulas/issues/131
The script in #119 seems to go in the right direction, although, as already noted, does not work with ranges. In my case, it correctly returns
>>> get_one_line_formula(xl_model, "'[test.xlsx]WS'!M15")
'SUM((0.22 / 0.6):N14)'
With few changes and an helper function
import re
def expand_range(range_str,xl_model):
'''
Convert a string representing an Excel range like "'[test.xlsx]WS'!B3:D8" into a list of strings representing all single Excel cells of the range, filtering out cells unused in Excel document model `xl_model`.
'''
match = re.match(r"'([^']+)'\!([A-Z]+)(\d+):([A-Z]+)(\d+)", range_str)
if match:
sheet_name, col_start, row_start, col_end, row_end = match.groups()
col_start_num = ord(col_start) - 65
col_end_num = ord(col_end) - 65
num_rows = int(row_end) - int(row_start) + 1
cell_list = ["'{}'!{}{}".format(sheet_name, chr(65 + col), row)
for row in range(int(row_start), int(row_start) + num_rows)
for col in range(col_start_num, col_end_num + 1)]
# cells really used in Excel document are only those in xl_model.dsp.dmap.pred dictionary keys
return [c for c in cell_list if c in xl_model.dsp.dmap.pred]
else:
return None
import functools
import schedula as sh
@functools.lru_cache()
def pure_input_cells(xl_model, node):
'''
Knowing the model of an Excel document, find recursively the formula expressed in terms of "pure input" cells of a given formula.
'''
rl = expand_range(node,xl_model) # list of cells of a range, or None if argument is not a range
if rl is not None:
# even if in Excel parameters are separated by ';', in `formulas` model the separation character is ','
return ','.join(pure_input_cells(xl_model, rc) for rc in rl if rc in xl_model.dsp.dmap.pred)
try:
formula = next(iter(xl_model.dsp.dmap.pred[node]))
except StopIteration:
return node
for k in sorted(xl_model.dsp.dmap.pred[formula], key=lambda x: (len(x), x), reverse=True):
if not isinstance(k, sh.Token):
formula = formula.replace(k, pure_input_cells(xl_model, k))
return formula[1:]
I was able to make it returns
>>> pure_input_cells(xl_model, "'[test.xlsx]WS'!M15")
"SUM(('[test.xlsx]WS'!K10 / '[test.xlsx]WS'!H10),'[test.xlsx]WS'!M11,('[test.xlsx]WS'!H12 * '[test.xlsx]WS'!K12))"
which express the formula of the input cell in terms of "pure input" cells expanding the range.
I hope a similar functionality will be added to the library core.
I'm trying to disclose the computing structure of an excel file in order to reproduce and perform those computation from code. Basically I'd like to have a function able to find all the dependencies of a cell:
Once you know all the "pure input" cells, computing the value of a formula cell is easy with
calculate
method:The initilization is easy,
formulas
does a great job at creating a model and all information seem to be there:When the
inputs
method of a cell returnsNone
(as far as I understand) the cell does NOT contain a formula (like "pure input" cells), but as soon as I look at a cell containing a formulaor
I could not understand if and how I can use
formulas
methods to:Thanks in advance for your attention.