ironSource / parquetjs

fully asynchronous, pure JavaScript implementation of the Parquet file format
MIT License
345 stars 173 forks source link

Client side version #87

Open vasner opened 4 years ago

vasner commented 4 years ago

Hi All, Is it possible to build parquetjs as client side (browser) js library?

dobesv commented 4 years ago

I think for compression it does depend on some other modules that are not available in the browser, though for uncompressed data it might work. Have you tried it?

vasner commented 4 years ago

I've never tried.

aviillouz commented 4 years ago

brotli has an issue with encoding on browsers (https://github.com/foliojs/brotli.js/issues/20) and suggests to import only 'brotli/decompress' in the browser as a workaround. suppose I want only to read files - can I just import ParquetReader?

import ParquetReader from 'parquetjs/lib/reader';

I'm still getting the following error from brotli.

ReferenceError: Browser is not defined
node_modules/brotli/build/encode.js:49
sota1235 commented 4 years ago

I also got an error - same as aviillouz - when running Jest test, but I don't get that when I run same file on production...

danb235 commented 3 years ago

I ran into the same error. This package isn't intended to run in the browser, Jest by default tests in a browser environment. Set the testEnvironment to node in your package.json, and the error will go away when using parquetjs.

"jest": {
    "testEnvironment": "node"
  },

https://jestjs.io/docs/en/configuration#testenvironment-string

dobesv commented 3 years ago

I have been using a derived package, parquets in the browser, and it works.

eatonphil commented 3 years ago

Thanks @dobesv! Link for everyone: https://github.com/kbajalc/parquets.

eatonphil commented 3 years ago

Hmm after trying it myself kbajalc/parquets does not work in the browser either. Maybe I'm doing it wrong but it fails while importing thrift.

dobesv commented 3 years ago

Make sure you're using thrift 0.14.0, I believe 0.13.0 was a broken build.

eatonphil commented 3 years ago

A problem is that thrift includes a lot of nodejs-only libraries:

 > node_modules/parquets/lib/compression.js:4:21: error: Could not resolve "zlib" (use "--platform=node" when building for node)
    4 │ const zlib = require("zlib");
      ╵                      ~~~~~~

 > node_modules/thrift/lib/nodejs/lib/thrift/log.js:20:19: error: Could not resolve "util" (use "--platform=node" when building for node)
    20 │ var util = require('util');
       ╵                    ~~~~~~

 > node_modules/thrift/lib/nodejs/lib/thrift/thrift.js:19:19: error: Could not resolve "util" (use "--platform=node" when building for node)
    19 │ var util = require('util');
       ╵                    ~~~~~~

 > node_modules/thrift/lib/nodejs/lib/thrift/ws_connection.js:19:19: error: Could not resolve "util" (use "--platform=node" when building for node)
    19 │ var util = require('util');
       ╵                    ~~~~~~

 > node_modules/thrift/lib/nodejs/lib/thrift/ws_connection.js:21:27: error: Could not resolve "events" (use "--platform=node" when building for node)
    21 │ var EventEmitter = require("events").EventEmitter;

So it doesn't help me out trying to run parquets in the browser :/

dobesv commented 3 years ago

Maybe these webpack settings will help:

resolve: {
    extensions,
    fallback: {
      assert: require.resolve('assert/'),
      console: require.resolve('console-browserify'),
      crypto: require.resolve('crypto-browserify'),
      domain: require.resolve('domain-browser'),
      events: require.resolve('events'),
      path: require.resolve('path-browserify'),
      stream: require.resolve('stream-browserify'),
      string_decoder: require.resolve('string_decoder/'),
      util: require.resolve('util/util.js'),
      vm: require.resolve('vm-browserify'),
      zlib: require.resolve('browserify-zlib'),
    },
    mainFields: ['browser', 'main', 'module'],
    modules: modulesSearchPath,
    alias: enableHotReload ? { 'react-dom': '@hot-loader/react-dom' } : {},
  },
kylebarron commented 2 years ago

Maybe relevant to people in this thread: I have a basic but functional WebAssembly Parquet reader/writer here: https://github.com/kylebarron/parquet-wasm, compiled from Rust.

jonnor commented 2 years ago

There also seems to be a fork with browser support (and Typescript types) at https://github.com/LibertyDSNP/parquetjs - which is also available on NPM.

plotka commented 2 years ago

I have been using a derived package, parquets in the browser, and it works.

@dobesv Would you mind sharing an example implementation? I am struggling to make it work in the browser.