microsoft / python-in-excel

Python in Microsoft Excel
MIT License
490 stars 31 forks source link

Choosing object or values shouldn't make a difference for calculations #22

Open fzumstein opened 10 months ago

fzumstein commented 10 months ago

Switching between the object and value representation should be purely a different view without having an impact on the results of further calculation steps. Right now, switching from object to values breaks this sheet:

Formulas:

Screenshot 2023-09-08 at 8 30 02 AM

Object:

Screenshot 2023-09-08 at 8 30 24 AM

Values:

Screenshot 2023-09-08 at 8 30 52 AM
jflam commented 10 months ago

I don't think I understand why you expect that changing from by val to by ref would not break the sheet? In one case we are storing an object reference in a single cell; in the other case we are spilling the contents of a DataFrame across many cells.

What would you have it do instead?

ncalm commented 10 months ago

@fzumstein At the point the data are displayed, the content of D1 is no longer a data frame, it's a value: "a".

If you put this in D1 instead, you can have both:

df = xl("A1:B3", headers=True)

This can display as values (spill to the sheet) and retain the ability for other cells to refer to df.

fzumstein commented 10 months ago

@jflam It's still a spreadsheet, isn't it? So if the formula doesn't change (and if it's a deterministic function), I don't expect it to have an influence on the calculations.

The (outdated?) docs https://support.microsoft.com/en-us/office/py-function-31d1c523-fb13-46ab-96a4-d1a90a9e512f seem to suggest that PY requires a return_type (0 - Excel value, 1 - Python object.): =PY(python_code,return_type), which would improve the situation, but that's not how the formula works in my version of Excel (it doesn't have the return_type).

Objects in cells have been around for decades in about every single bank on this planet, and they always work the same: you can only return an object OR values to a cell. Consequently, if you want to view the values of the object, you'd do something like =A1.values, assuming that A1 holds your object reference (that's how Excel data types like stocks work, anyway).

b-gonzalez commented 10 months ago

This seems like a trivial point. You could easily write a formula similar to the one you wrote on cell G1 like xl("D1") in Python Object mode in a different cell. And from there just use the Excel Value option on that instead without breaking the references in G1 and H1.

fzumstein commented 10 months ago

@b-gonzalez That's close to my suggestion, only that is has to happen via formula (=D1.values), not via dropdown in the formula bar that can be changed by anybody without leaving a trace in the formula view.

jflam commented 10 months ago

I understand the confusion now. In reality the formula in the sheet is =PY(code, {0,1}). The UI in the formula editor toggles that last parameter, so it's also consistent with what our documentation says.

I think you perceive this as changing a formatting setting as opposed to changing the formula?

fzumstein commented 10 months ago

Yes, that's correct, that parameter is hidden in the formula, which is very confusing.

keyur32 commented 10 months ago

Agree we should improve the documentation here. Tagging as a documentation issue.

fzumstein commented 10 months ago

I've just discovered that you can do =A1.arrayPreview, assuming that A1 has a DataFrame. That's basically what I'd want, but in the generic case, i.e., covering scalars as well. This makes it so much cleaner to understand that you're outside of the PY environment again, facing normal Excel cell values.

fzumstein commented 9 months ago

So having worked with PiE for a while, my conclusion is that referencing PY cells from other PY cells -- while possible and sort of a natural excel workflow -- should best be avoided. Instead, as @ncalm mentioned, just work with variables like you'd do in real Jupyter notebooks, too.