Closed deitch closed 6 years ago
Yes we need to update the README! The class is just called 'ParquetWriter' now, you can see an example using it here: https://github.com/ironSource/parquetjs/blob/master/examples/writer.js#L16
To write directly to a stream with the current version, you have to first construct a ParquetEnvelopeWriter for your stream (see https://github.com/ironSource/parquetjs/pull/19/files#diff-a3be12e47bf4eaa0075c39ab068184b8R155), and then create a ParquetWriter for the ParquetEnvelopeWriter, similar to the way the openFile method does it https://github.com/ironSource/parquetjs/blob/master/lib/writer.js#L42.
Still since this is already the second time the issue came up, I will also add a convenience method to ParquetWriter to open a new writer for a stream.
I will also add a convenience method to ParquetWriter to open a new writer for a stream
Thanks; then the file writer is just a special convenience case. I look forward to it.
While I am at it, not an issue per se, but the README says pure JavaScript implementation
, but it does look like it has a dependency on native compiled libs:
$ npm i parquetjs
> ws@0.4.32 install /private/tmp/node_modules/ws
> (node-gyp rebuild 2> builderror.log) || (exit 0)
CXX(target) Release/obj.target/bufferutil/src/bufferutil.o
> lzo@0.4.1 install /private/tmp/node_modules/lzo
> node-gyp rebuild
I added the ParquetWriter::openStream
method to master (https://github.com/ironSource/parquetjs/blob/master/lib/writer.js#L52). We'll release another version to NPM later after some testing.
Regarding the "pure JavaScript" thing I suppose it's a matter of definition. What we tried to communicate with that statement is that the parquetjs library does not somehow wrap the Apache Java or C++ implementation of parquet, but is a from-scratch implementation in JavaScript.
The lzo dependency specifically is problematic for some reasons but is entirely optional; hopefully one day LZO will be added to the node.js core api (like zlib). Until then we could add a more explicit option to run without lzo support for folks that want use parquetjs without node.js/FFI (but it should also work now as long as you don't explicitly request LZO compression).
Passing by with a suggestion: "pure Node.js implementation".
Sounds good to me!
I added the ParquetWriter::openStream method to master (https://github.com/ironSource/parquetjs/blob/master/lib/writer.js#L52). We'll release another version to NPM later after some testing.
Thanks. I also saw you updated the README
to do ParquetWriter.openFile()
.
README
to include ParquetWriter.openStream()
?I suppose it's a matter of definition
Yes, and whole parquetjs is pure JS, it has a dependency that is not. Fortunately, it is reasonable.
Still and all, I like this. Thanks again.
According to the docs, you start with:
Yet, no such
parquet.ParquetFileWrite
exists:As it is, I came across it looking for a way to get it to write directly to a stream, rather than to a file.