noirello / pyorc

Python module for Apache ORC file format
Apache License 2.0
64 stars 21 forks source link

save orc contents to csv #30

Closed blueRegen closed 3 years ago

blueRegen commented 3 years ago

Hi

Is there a way to save ORC file contents to CSV? I want to save a_reader to a CSV file.

import pyorc

with open("./2.orc", "rb") as data:
    a_reader = pyorc.Reader(data)
    print(type(reader))
    i = 0
    for row in a_reader:
        i += 1
        if i < 10:
            payload = row[1]
            print(type(payload))
            #print(row[1])
            print("")

thanks.

fehtemam commented 3 years ago

I would do something like this:

import pandas as pd
import pyorc
reader = pyorc.Reader(data)
df = pd.DataFrame(reader)
# picking the first i rows
df = df.iloc[:i, :]
df.to_csv('path_to_file.csv', index=False)

Pandas makes it really easy to save your data to many formats including CSV.

blueRegen commented 3 years ago

@fehtemam works like a charm.

thank you so much for the help.