jcrobak / parquet-python

python implementation of the parquet columnar file format.
Apache License 2.0
340 stars 257 forks source link

parquet.ParquetFormatException: Unsupported encoding: RLE_DICTIONARY #86

Open d33tah opened 7 months ago

d33tah commented 7 months ago
> echo -e 'hi' | parquet-fromcsv  --input-file /dev/stdin  --schema <(  echo 'message schema { OPTIONAL BYTE_ARRAY key (STRING); }' ) --output-file test.parquet
> base64 -w 0 < test.pq 
UEFSMRUEFRoVGkwVBBUAEgAAAwAAAGtleQIAAABoaRUAFRIVEiwVBBUQFQYVBhxYA2tleRgCaGkAAAACAAAABAEBAwIVDBk1AAYQGRgDa2V5FQAWBBaAARaAASY2JgAcWANrZXkYAmhpAAAZEQIZGAJoaRkYA2tleRUAGRYAABkcFj4VShYAAAAVAhksSAxhcnJvd19zY2hlbWEVAgAVDCUCGANrZXklAEwcAAAAFgQZHBkcJogBHBUMGTUABhAZGANrZXkVABYEFoABFoABJj4mCBxYA2tleRgCaGkAABb+ARUUFtYBFSgAFoABFgQmCBaAARQAABkcGAxBUlJPVzpzY2hlbWEYnAEvLy8vLzJ3QUFBQVFBQUFBQUFBS0FBd0FDZ0FKQUFRQUNnQUFBQkFBQUFBQUFRUUFDQUFJQUFBQUJBQUlBQUFBQkFBQUFBRUFBQUFVQUFBQUVBQVVBQkFBRGdBUEFBUUFBQUFJQUJBQUFBQVlBQUFBREFBQUFBQUFBUVVRQUFBQUFBQUFBQVFBQkFBRUFBQUFBd0FBQUd0bGVRQT0AGBlwYXJxdWV0LXJzIHZlcnNpb24gNDYuMC4wADoBAABQQVIx
> python3 -m parquet test.pq
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/tmp/pq2/parquet-python/parquet/__main__.py", line 63, in <module>
    main()
  File "/tmp/pq2/parquet-python/parquet/__main__.py", line 59, in main
    parquet.dump(args.file, args)
  File "/tmp/pq2/parquet-python/parquet/__init__.py", line 526, in dump
    return _dump(file_obj, options=options, out=out)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/pq2/parquet-python/parquet/__init__.py", line 506, in _dump
    for row in DictReader(file_obj, options.col):
  File "/tmp/pq2/parquet-python/parquet/__init__.py", line 415, in DictReader
    for row in reader(file_obj, columns):
  File "/tmp/pq2/parquet-python/parquet/__init__.py", line 464, in reader
    values = read_data_page(file_obj, schema_helper, page_header, cmd,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/pq2/parquet-python/parquet/__init__.py", line 376, in read_data_page
    raise ParquetFormatException("Unsupported encoding: {}".format(
parquet.ParquetFormatException: Unsupported encoding: RLE_DICTIONARY