.. image:: https://travis-ci.org/jcrobak/parquet-python.svg?branch=master :target: https://travis-ci.org/jcrobak/parquet-python
parquet-python is a pure-python implementation (currently with only
read-support) of the parquet format <https://github.com/apache/parquet-format>
_. It comes with a
script for reading parquet files and outputting the data to stdout as
JSON or TSV (without the overhead of JVM startup). Performance has not
yet been optimized, but it's useful for debugging and quick viewing of
data in files.
Not all parts of the parquet-format have been implemented yet or tested
e.g. nested data—see Todos below for a full list. With that said,
parquet-python is capable of reading all the data files from the
parquet-compatability <https://github.com/Parquet/parquet-compatibility>
_
project.
parquet-python has been tested on python 2.7, 3.6, and 3.7. It depends
on pythrift2
and optionally on python-snappy
(for snappy compressed
files, please also install parquet-python[snappy]
).
parquet-python is available via PyPi and can be installed using
pip install parquet
. The package includes the parquet
command for reading python files, e.g. parquet test.parquet
.
See parquet --help
for full usage.
parquet-python currently has two programatic interfaces with similar
functionality to Python's csv reader. First, it supports a DictReader
which returns a dictionary per row. Second, it has a reader which
returns a list of values for each row. Both function require a file-like
object and support an optional columns
field to only read the
specified columns.
.. code:: python
import parquet
import json
## assuming parquet file with two rows and three columns:
## foo bar baz
## 1 2 3
## 4 5 6
with open("test.parquet", "rb") as fo:
# prints:
# {"foo": 1, "bar": 2}
# {"foo": 4, "bar": 5}
for row in parquet.DictReader(fo, columns=['foo', 'bar']):
print(json.dumps(row))
with open("test.parquet", "rb") as fo:
# prints:
# 1,2
# 4,5
for row in parquet.reader(fo, columns=['foo', 'bar']):
print(",".join([str(r) for r in row]))
Is done via Pull Requests. Please include tests with your changes and
follow pep8 <http://www.python.org/dev/peps/pep-0008/>
_.
To run the tests you must install and execute tox
(pip install tox
) to
run for all supported versions. If you want to run just for your current
version, execute: pip install -r requirements-development.txt
and then
nosetests
.