metinsenturk / flat_table

An extention to json_normalize() in pandas
https://pypi.org/project/flat-table/
MIT License
27 stars 9 forks source link

If data contain empty list #16

Open svaduka opened 7 months ago

svaduka commented 7 months ago

Hi Metinsenturk, It was Wonderfil about the flat-table, thanks for your effort.

When I try to use this for a json data, it is not working with an empty list of data, it is not working.

classes": [ { "iopvSymbol": null, "dividendFreq": null, "prices": [] } ]

svaduka commented 7 months ago

The below is the error: Traceback (most recent call last): File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'index'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/Sainagaraju_Vaduka/IdeaProjects/proj/src/python/json_to_df_latest_flat_table.py", line 54, in output = flat_table.normalize(df) File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/flat_table/_norm.py", line 168, in normalize mp = mapper(df) File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/flat_table/_norm.py", line 134, in mapper _child = to_rows(child) File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/flat_table/_norm.py", line 71, in to_rows ds = get_obj_from_iterable(ds) File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/flat_table/_norm.py", line 29, in get_obj_from_iterable return set_index(df).iloc[:, 0] File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/flat_table/_norm.py", line 42, in set_index temp.index = temp['index'].values File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3458, in getitem indexer = self.columns.get_loc(key) File "/Users/Sainagaraju_Vaduka/IdeaProjects/TestTeam/venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: 'index'

Masame commented 2 months ago

As a workaround, you can use:

list_cols = (records.sample(100, replace=True).map(type).astype(str) == "<class 'list'>").any(axis=0)
list_cols = list_cols[list_cols == True].index.tolist()
df_cols = records.columns.tolist()
for col in df_cols:
    if col in list_cols:
        records[col] = records[col].apply(lambda y: pd.NA if len(y) == 0 else y)
records = flat_table.normalize(records)