pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.73k stars 17.94k forks source link

Using to_json/read_json with orient='table' on a DataFrame with a single level MultiIndex does not work #29928

Open larrymouse opened 4 years ago

larrymouse commented 4 years ago

Code Sample, a copy-pastable example if possible

import pandas

#Scenario1
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)]))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table') # this runs but creates NaN indexes

#Scenario2
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)], names=['ind1']))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
try:
    new_df==pandas.read_json(js, orient='table')  # this fails with KeyError: 'None of [None] are in the columns'
except KeyError as e:
    pass

#Scenario3
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,1),(1,2),(2,1),(2,2)], names=['ind1', 'ind2']))
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table')  # this works as is is multi level multiindex

#Scenario4
df=pandas.DataFrame(data=[[1,1],[2,2],[3,3],[4,4]],columns=['A','B'], index=pandas.MultiIndex.from_tuples([(1,),(2,),(3,),(4,)]))
df.index=df.index.to_flat_index() # a workaround?
js=df.to_json()
pandas.read_json(js) #ok
js=df.to_json(orient='table')
new_df=pandas.read_json(js, orient='table') # This is a workaround that does produce sensible results

Problem description

I have a DataFrame that has a MultiIndex, even though that index only has one level and could easily be a normal index, I am receiving this table from ipyaggrid, so am not in control of that.

Saving this table to json with df.to_json(orient='table) and then loading from pandas.read_json(js, orient='table) leads to either an exception or the index being changed to NaN values.

Neither of these outcomes seems desirable or correct.

I have a workaround to intercept DataFrames with this feature and convert their index so a flat Index, but it would be good if the table serialisation just worked.

I was drawn to orient='Table' by the comments on this similar issue, my main requirement here is to save/load the DataFrame preserving its row and column order. You don't get this with the standard to_json/from_json modes https://github.com/pandas-dev/pandas/issues/4889

Expected Output

new_df for scenario1 and scenario2 to should have a MultiIndex, in the first scenario with a level that has no name and in the second scenario with level called 'ind1'

Output of pd.show_versions()

pandas.show_versions() INSTALLED VERSIONS ------------------ commit : None python : 3.6.7.final.0 python-bits : 64 OS : Windows OS-release : 7 machine : AMD64 processor : Intel64 Family 6 Model 62 Stepping 4, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.None pandas : 0.25.0 numpy : 1.17.0 pytz : 2019.2 dateutil : 2.8.0 pip : 10.0.1 setuptools : 39.1.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.9.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.1.1 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : None tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None
janheindejong commented 4 years ago

I guess I have the same problem.

Creating two DFs, one with multiindex, and one without:

import pandas as pd
import numpy as np
from datetime import datetime 

index = [datetime(2000, 1, 1, 0, 0), datetime(2000, 1, 1, 0, 1), datetime(2000, 1, 1, 0, 2)]
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
columns_mi = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
columns = ["A", "B", "C", "D", "E", "F", "G", "H"]

df = pd.DataFrame(data=np.random.randn(3, 8), index=index, columns=columns)
df_mi = pd.DataFrame(data=np.random.randn(3, 8), index=index, columns=columns_mi)

Converting the 'normal' DataFrame to json and back works fine (both in split and table orientation):

s = df.to_json(orient="split")
pd.read_json(s, orient="split")
s = df.to_json(orient="table")
pd.read_json(s, orient="table")

But doing the same with the multiindex one doesn't.

s = df_mi.to_json(orient="split")
pd.read_json(s, orient="split")

Raises: TypeError: <class 'tuple'> is not convertible to datetime

s = df_mi.to_json(orient="table")
pd.read_json(s, orient="table")

Raises: NotImplementedError: orient='table' is not supported for MultiIndex.

I'm trying to send multiindexed dataframes over HTTP; any suggestions on how I can do that?

LucasG0 commented 4 years ago

I submitted a PR for the author issue.

About janheindejong issues,

Raises: TypeError: <class 'tuple'> is not convertible to datetime

This may be a dupe of #4889, MultiIndex Dataframes and Series do not seem to be handled by read_json(orient='split').

NotImplementedError: orient='table' is not supported for MultiIndex

orient='table' does not support MultiIndex columns on Dataframes.

Interesting6 commented 11 months ago

Same question, I'm trying to send multiindexed dataframes over HTTP.

When i serialise multi-index dataframe to an json by split or table orient, tuple inside the index would be converted to list. But when i try to deserialise it, it raise TypeError: unhashable type: 'list' ;

Any suggestions on how I can do that?

developer992 commented 8 months ago

it's year 2024 and there is still no generic support to dump dataframe to json and get it back ?

NotImplementedError: orient='table' is not supported for MultiIndex columns