rmetzger / stratosphere-sql

My private playground to develop SQL support on Stratosphere
Apache License 2.0
4 stars 2 forks source link

Add support for reading from Parquet (or ORC) #4

Open rmetzger opened 10 years ago

rmetzger commented 10 years ago

Make sure that only the required columns are fetched.

http://parquet.io/

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/orcfile.html

zerolevel commented 10 years ago

Hi Robert. I learnt the parquet methods of storing the data in coloumnar way. Would we only want to read from parquet ? Or make it a standard way for storing all the data ?(provided that it can compress nested data in a nice way)

rmetzger commented 10 years ago

Hi,

Thanks for looking into it. We only want to use it for reading.

Sent from my iPhone

On 11.03.2014, at 19:30, zerolevel notifications@github.com wrote:

Hi Robert. I learnt the parquet methods of storing the data in coloumnar way. Would we only want to read from parquet ? Or make it a standard way for storing all the data ?(provided that it can compress nested data in a nice way)

Reply to this email directly or view it on GitHubhttps://github.com/rmetzger/stratosphere-sql/issues/4#issuecomment-37332018 .

rmetzger commented 10 years ago

Have a look at our mailing list. Artem has started to / implemented a test case with Paquet for Stratosphere. He promised to open a pull request soon.