ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
346 stars 175 forks source link

All values in stream _transform row are either null or undefined #83

Closed ochart2 closed 5 years ago

ochart2 commented 5 years ago

I return new parquet.ParquetTransformer(schema);

and then I write to the stream as normal... all the values for the document are either null or undefined when inspecting them in ParquetTransformer _transform method.

What is the correct way to use streams with parquetjs?

ochart2 commented 5 years ago

Got it working. Thanks for the lib!

jdhankins commented 4 years ago

@ochart2, could you upload your solution? Thanks for your help.

ochart2 commented 4 years ago

Looks something like this:

import { Transform } from 'stream';
import { ParquetSchema, ParquetTransformer } from 'parquetjs';

const types: any = {
    string: { compression: 'GZIP', type: 'UTF8' },
    int: { compression: 'GZIP', type: 'INT32', encoding: 'RLE', bitWidth: 7 },
    date: { compression: 'GZIP', type: 'TIMESTAMP_MILLIS', encoding: 'RLE', bitWidth: 7 },
};

const schema = new ParquetSchema({
    a: types.string,
    b: types.int,
    c: types.date,
});

export const newParquetStream = function (): Transform {
    return new ParquetTransformer(schema);
};

I was having issues reading the parquet file from Redshift... Good luck, let me know how you fare.