None object becomes 'None' string when written as Excel Values, loses 'na' status in subsequent operations

ncalm commented 9 months ago

If I have a DataFrame df with np.nan in numeric column(s), when df is written as Excel values, the np.nan value is written as #NUM!

This is OK, though I wonder whether #N/A would be better.

However, when I have an na value in an object (i.e. str) column, the None object that represents the missing value in those columns is written to the spreadsheet as the string 'None'.

If the spilled output of that cell is then referenced from another Python cell, the na status of those cells is lost, and the missing value has been by default filled with 'None'.

Can I request that the 'None' object in a Pandas DataFrame or Series is written as #N/A when using 'Excel values'?

Example:

In cell B2:

data = {'Column1': [1, 2, np.nan, 4, np.nan],
        'Column2': [None, None, None, None, None],
        'Column3': [None, None, '1 to 49 acres', None, '50 to 99 acres']}

df = pd.DataFrame(data)

In cell B10, where both the np.nan and the None are recognized as na: df.isna()

However, if we try this in some other cell, e.g. H10:

df_2 = xl("B2#", True)
df_2.isna()

The output treats the None cells as strings.

keyur32 commented 9 months ago

Tagging @macky360 as we're discussing right behavior internally.

fzumstein commented 7 months ago

I've also had some thought on this and while it would definitely make sense to use #NUM! in custom functions, PiE isn't a custom function, so might need to follow different rules.

I would argue that an Excel range with empty cells that gets read into a DataFrame and directly returned back to Excel via Values mode should probably not look different than the original, so should at least have the option to turn pd.isnull() values into empty cells again.

Right now, df.where(df.notnull(), "") seems to be a good workaround for preserving the data presentation across the Excel - pandas - Excel roundtrip, see screenshot.

microsoft / python-in-excel

None object becomes 'None' string when written as Excel Values, loses 'na' status in subsequent operations #38