ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

Can write file on AWS S3 #95

Open rkbsoftsolutions opened 4 years ago

rkbsoftsolutions commented 4 years ago

Actually I am using parquetjs in Meteor.js . I want to create a parquet data file .

ParquetWriter.openFile(schema, filePath) , I am getting below error.

W20191226-23:56:11.534(5.5)? (STDERR) (node:6898) UnhandledPromiseRejectionWarning: missing required field: assets W20191226-23:56:11.534(5.5)? (STDERR) (node:6898) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 14).

It seems related to path or permission issue. But instead of create local file .

Is it possible to upload paraqut file AWS S3?

taozhiyuzhuo commented 4 years ago

@staronline1985 hi, do you find the solution upload parquet to AWS S3

rkbsoftsolutions commented 4 years ago

Yes , I had found the solution to upload parquet to AWS S3 . But I am getting issue with large files. For Example : I have read JSON or CSV format file and convert into Parquet format. It keep all data in memory until unless close to parquet writer. It will not work for me with large file. My job was read file json file from S3 and convert into parquet format and upload again on S3.

rkbsoftsolutions commented 4 years ago

I think it should be stream based so read data as stream and convert and upload stream to S3

taozhiyuzhuo commented 4 years ago

@staronline1985 I have the same mission. But for now. it just needs me covert local CSV file to parquet and upload s3. But it needs to create a local parquet file and then readFileSync as a buffer to upload . I want to upload S3 directly , don't save local. How to do that?

rkbsoftsolutions commented 4 years ago

@staronline1985 I have the same mission. But for now. it just needs me covert local CSV file to parquet and upload s3. But it needs to create a local parquet file and then readFileSync as a buffer to upload . I want to upload S3 directly , don't save local. How to do that?

I am also doing same and waiting for parquetjs , if any possibility for same . Otherwise I will go with other repo.

muratcorlu commented 4 years ago

You need to use ParquetTransformer as mentioned in #76

govthamreddy commented 3 years ago

You need to use ParquetTransformer as mentioned in #76

Do you have an example for doing it?

aijazkhan81 commented 2 years ago

@govthamreddy do you have a working example of pushing it to s3?

magno32 commented 1 year ago

Example from #76

https://github.com/ironSource/parquetjs/issues/76#issuecomment-1312158235