ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

invalid encoding: PLAIN_DICTIONARY #108

Open ekschro opened 4 years ago

ekschro commented 4 years ago

The Issue

When trying to read a test parquet file fetched from an s3 bucket, I get an invalid encoding: PLAIN_DICTIONARY error. This is after getting an invalid parquet version error multiple times due to corrupt files. So, I would think this is a sign that the file is being recognized as a parquet file and just not being read correctly. Is there anything I am not doing correctly?

The Code

(async () => {
  try {
    let reader = await parquet.ParquetReader.openFile('./fetched3.parquet');

    let cursor = reader.getCursor();

    let record = null;
    while (record = await cursor.next()) {
      console.log(record);
    }
  }
  catch(err) {
    console.error(err)
  }
})();
ekschro commented 4 years ago

I just realized that PLAIN and PLAIN_DICTIONARY are two different forms of encoding.

Are there any plans to support PLAIN_DICTIONARY encoding in the future?

zeitiger commented 3 years ago

I would be interested in this too

ekschro commented 2 years ago

Hey @zeitiger - Did you ever find a work around for this?

mattfysh commented 1 year ago

I'm also getting this error, trying to read a parquet file created by AWS Wrangler (aka AWS SDK Pandas), no solution yet

hackermondev commented 1 year ago

any updates on this

valdo404 commented 2 months ago

For your information the lib does not support RLE_DICTIONARY as well. The workaround was to reencode the file to PLAIN

valdo404 commented 2 months ago

Also it does not support float8 data types