ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
346 stars 175 forks source link

Add parquet-mr test #56

Open ZJONSSON opened 6 years ago

ZJONSSON commented 6 years ago

Here is a very basic example of how we can use dockerized parquet-tools (from parquet-mr) to test on travis whether files created by parquetjs can be read by parquet-mr (and therefore spark etc)

The basic test succeeds but more advanced tests fail. I will add a failing branch that we can use as a guide for fixing any errors.

image

ZJONSSON commented 6 years ago

Here is a failing branch: https://github.com/ZJONSSON/parquetjs/tree/parquet-mr-fail Problems with the RLE encoding

image

ZJONSSON commented 6 years ago

This PR has been rebased on https://github.com/ironSource/parquetjs/pull/57 to include fixes for RLE in dlevels and rlevels + more test added to verify that the results are correct as seen from parquet-mr

justinsoliz commented 6 years ago

I seem to be running into this issue as well. Are there any outstanding items on this PR that I might be able to help with to get it merged in?

ZJONSSON commented 6 years ago

Do your problems go away when you use this branch? The only outstanding thing here is a code review afaik.

justinsoliz commented 6 years ago

NPM install per this comment does the trick for me: https://github.com/ironSource/parquetjs/issues/29#issuecomment-385808572