noirello / pyorc

Python module for Apache ORC file format
Apache License 2.0
64 stars 21 forks source link

Saving a Pandas Dataframe to orc #17

Closed EliYk closed 3 years ago

EliYk commented 4 years ago

Hi,

First of all, cheers, a much needed library!

Could you perhaps add an example on how to save a Pandas dataframe to an ORC file using pyorc?

I'm not sure how to go about it.

Thanks, Eli

noirello commented 4 years ago

Hi, I think the simplest (or naivest, I'm not that familiar with pandas) solution is to convert the Dataframe to dictionary with records orientation, and use a Writer with dict struct representation.


import pyorc
import pandas as pd

output = open('test.orc', 'wb')
df = pd.DataFrame({
    'num': [1,2,3,4],
    'bool': [True, False, False, True],
    'text': ['apple', 'pear', 'orange', 'grape']
})
writer = pyorc.Writer(output, "struct<num:int,bool:boolean,text:string>", struct_repr = pyorc.StructRepr.DICT)
writer.writerows(df.to_dict(orient="records"))
writer.close()