pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.9k stars 18.03k forks source link

ENH: Add support to read and write Amazon ION files #55725

Open anna-geller opened 1 year ago

anna-geller commented 1 year ago

Feature Type

Problem Description

We heavily rely on Amazon ION file format. Currently, reading ION files as Pandas dataframes requires workarounds.

Feature Description

It would be great to add support for ION in pandas using read_ion and write_ion methods.

Alternative Solutions

Here is a reproducer of a workaround we use for now:

import amazon.ion.simpleion as ion
from amazon.ion.simple_types import IonPyNull
import pandas as pd
import requests

def convert_ion_nulls(value):
    return None if isinstance(value, IonPyNull) else value

url = "https://huggingface.co/datasets/kestra/datasets/resolve/main/ion/employees.ion"
response = requests.get(url)
response.raise_for_status()
ion_content = response.content
ion_data = ion.loads(ion_content, single_value=False)
list_of_dicts = [dict(record) for record in ion_data]
list_of_dicts = [
    {k: convert_ion_nulls(v) for k, v in record.items()} for record in list_of_dicts
]
df = pd.DataFrame(list_of_dicts)

For writing files:

import amazon.ion.simpleion as ion

list_of_values = df.to_dict("records")

def save_as_ion(dict_or_list, file_name):
    with open(file_name, "wb") as f:
        ion.dump(dict_or_list, f)

save_as_ion(list_of_values, "mydata.ion")
mroeschke commented 1 year ago

The code snippet seems fairly small enough to not need to be maintained directly in pandas so I would be -1 on this proposal. If you or someone else developed a 3rd party library to wrap that code snippet, we'd happily include it in our ecosystem docs