ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

How to upload the parquet file to s3? #130

Open aijazkhan81 opened 2 years ago

aijazkhan81 commented 2 years ago
var writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet');
await writer.appendRow({name: 'apples', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.appendRow({name: 'oranges', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.close();

I have done this part, and the file gets saved locally. How to attach the file to a variable? If I can attach it to a variable, it will be easier to upload the file.

sambonbonne commented 2 years ago

If it can help you, I managed to do it by using pure stream, but I don't know if appendRow is compatible with stream mode:

  1. I receive my stream from a request, here I will name it sourceStream (you need to create your own Readable stream I guess)
  2. I create a ParquetTransformer, here I will name it parquetStream and pipe it to sourceStream
  3. I create an AWS S3 putObjectRequest with the stream as Body, using the official AWS SDK
// say we already have sourceStream and `parquetStream`
s3Bucket.upload({
  Bucket: 'bucketName',
  Key: 'path/of/the/file',
  Body: sourceStream.pipe(parquetStream)
}); // here I do a .promise() but this is for my usage

Using stream can have the advantage of saving RAM.

eliasrosa commented 2 years ago

@sambonbonne, my friend!

Can you put a more complete example, please!

I don't understand anything about stream.

Thank you very much!

sambonbonne commented 2 years ago

@eliasrosa I'm sorry, I don't know how to make a more complete example. I can add some variables or something but I'm not sure it will help:

const parquetStream = new ParquetTransformer({ /* your parquet and transform parameters */ });

// saying you already have a Readable source stream as sourceStream
const conversionStream = sourceStream.pipe(parquetStream);

s3Bucket.upload({
  Bucket: 'bucketName',
  Key: 'path/of/the/file',
  Body: conversionStream
});

I don't want to discourage you but I think you should not try to use streams without understanding those. Streams are important in NodeJS and have multiple advantages, maybe learning more about streams would be useful for you if you use NodeJS.

(I hope you won't take this answer as an attack, I just don't know how I can help better)