telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
65 stars 105 forks source link

[cygnus-ngsi][OrionHDFSSink] Add support for Parquet format in OrionHDFSSink #681

Open frbattid opened 8 years ago

frbattid commented 8 years ago

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language. (http://parquet.incubator.apache.org/)

Parquet is a format some times requested by FIWARE users as a useful way of persisting historical Orion context data in HDFS.

Adding this new format will imply the extension of the available values about the file_format configuration parameter in NGSIHDFSSink (currently, json-row, json-column, csv-row and csv-column. We will be adding parquet-row and parquet-column, if such a distiction is possible, of simply parquet.

frbattid commented 8 years ago

Some interesting references: