kylebarron / parquet-wasm

Rust-based WebAssembly bindings to read and write Apache Parquet data
https://kylebarron.dev/parquet-wasm/
Apache License 2.0
482 stars 19 forks source link

Unable to get 0.6.0-beta.1 to work in Node #412

Closed mbostock closed 2 months ago

mbostock commented 6 months ago

Thanks for this library! I am trying to use the latest version published to npm, 0.6.0-beta.1. It seems that the conditional exports added in #382 cause Node to import the “esm” version rather than the “node” version.

Here is what I wrote:

import * as Arrow from "apache-arrow";
import * as Parquet from "parquet-wasm";

const table = Arrow.tableFromArrays({test: [42]});
process.stdout.write(Parquet.writeParquet(Parquet.Table.fromIPCStream(Arrow.tableToIPC(table, "stream"))));

This results in the following error, and note that it’s importing esm/arrow1.js:

…/node_modules/parquet-wasm/esm/arrow1.js:1174
            wasm.__wbindgen_add_to_stack_pointer(16);
                 ^

TypeError: Cannot read properties of undefined (reading '__wbindgen_add_to_stack_pointer')
    at Table.fromIPCStream (…/node_modules/parquet-wasm/esm/arrow1.js:1174:18)
    at …/test-parquet.js:5:57
    at ModuleJob.run (node:internal/modules/esm/module_job:217:25)
    at async ModuleLoader.import (node:internal/modules/esm/loader:316:24)
    at async loadESM (node:internal/process/esm_loader:34:7)
    at async handleMainPromise (node:internal/modules/run_main:66:12)

I tried to fix this by importing the Node entry explicitly:

import * as Arrow from "apache-arrow";
import * as Parquet from "parquet-wasm/node/arrow1.js";

const table = Arrow.tableFromArrays({test: [42]});
process.stdout.write(Parquet.writeParquet(Parquet.Table.fromIPCStream(Arrow.tableToIPC(table, "stream"))));

However, this isn’t allowed: 😞

node:internal/errors:497
    ErrorCaptureStackTrace(err);
    ^

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './node/arrow1.js' is not defined by "exports" in …/node_modules/parquet-wasm/package.json imported from …/test-parquet.js
    at new NodeError (node:internal/errors:406:5)
    at exportsNotFound (node:internal/modules/esm/resolve:268:10)
    at packageExportsResolve (node:internal/modules/esm/resolve:598:9)
    at packageResolve (node:internal/modules/esm/resolve:772:14)
    at moduleResolve (node:internal/modules/esm/resolve:838:20)
    at defaultResolve (node:internal/modules/esm/resolve:1043:11)
    at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:383:12)
    at ModuleLoader.resolve (node:internal/modules/esm/loader:352:25)
    at ModuleLoader.getModuleJob (node:internal/modules/esm/loader:228:38)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/module_job:85:39) {
  code: 'ERR_PACKAGE_PATH_NOT_EXPORTED'
}

I was able to workaround this problem by downgrading to 0.5.0 which doesn’t have the named entries.

It seems like the top-level import condition is taking precedence over the node condition…

(Related, did you intend to make 0.6.0-beta.1 the latest tag on npm? Or did you intend it to be a prerelease? If so you may want to use the next tag to encourage folks to stay on 0.5.0 until 0.6.0 is out of beta. 🙏)

kylebarron commented 6 months ago

Hi! Thanks for trying out the library! I was just testing the new conditional export support in 0.6.0-beta.1, so it's good to see that it needs improvement.

This results in the following error, and note that it’s importing esm/arrow1.js:

To fix the error while using this esm import, you need to await the default export. E.g. https://observablehq.com/@kylebarron/geoparquet-on-the-web#readParquet.

image

It seems like the top-level import condition is taking precedence over the node condition…

I'm not super familiar with intricacies of JS bundling; do you have an idea of why that is? Both import and node are listed here: https://github.com/kylebarron/parquet-wasm/blob/2204f946c6dce6ceddedd593cdf6847ff18c9c50/templates/package.json#L57-L71

I tried to fix this by importing the Node entry explicitly:

Hmm, maybe we should add specific exports for e.g. /node?

(Related, did you intend to make 0.6.0-beta.1 the latest tag on npm? Or did you intend it to be a prerelease? If so you may want to use the next tag to encourage folks to stay on 0.5.0 until 0.6.0 is out of beta. 🙏)

Oh I didn't know that was possible. I'm less familiar with NPM distribution compared to PyPI, where everything with an alpha or beta tag is considered a pre-release. I guess I need to look at npm publish options.

I'm sick this week but I'll try to improve this usability when I'm healthy!

mbostock commented 6 months ago

To fix the error while using this esm import, you need to await the default export.

Understood, but I’m in Node and I don’t want the esm import — I want the one targeting Node, which isn’t happening based on how the conditional imports are declared in the package.json.

I think that’s because Node itself supports both import and require, and in my case, I’m using type: "module" and hence Node is favoring the “import” conditional over the “node” conditional. Per the conditional exports docs:

Within the "exports" object, key order is significant. During condition matching, earlier entries have higher priority and take precedence over later entries. The general rule is that conditions should be from most specific to least specific in object order.

So I think you probably need something like this but I haven’t tested…

    ".": {
      "node": {
        "types": "./node/arrow1.d.ts",
        "default": "./node/arrow1.js"
      },
      "types": "./esm/arrow1.d.ts",
      "default": "./esm/arrow1.js"
    }

This should cause Node to use the Node-targeted entries, and everything else will use the esm bundle.

P.S. I hope you feel better! 🙏 Thanks for the support. 😁

kylebarron commented 6 months ago

I'm unable to reproduce your original issue. With node v20.4.0

{
  "dependencies": {
    "parquet-wasm": "^0.6.0-beta.1"
  }
}

and a tmp.mjs of

import * as Parquet from "parquet-wasm";
console.log(Parquet)

that seemed to just work.

In any case, I made https://github.com/kylebarron/parquet-wasm/pull/414. Are you able to see if that fixes the issues on your end? Or would it be easier if I published a beta.2 for you to test?

net commented 4 months ago

This is still an issue on beta.2.

kylebarron commented 4 months ago

Could you test with https://github.com/kylebarron/parquet-wasm/pull/414?

llimllib commented 3 months ago

edit: I discovered that you can't do git imports of this module because it has a build step that npm doesn't know about. Deleted this comment after I understood that

llimllib commented 3 months ago

OK, I take it all back, the sample code runs with the code from #414, it just took me a long time to figure out how. Steps:

  1. Clone this repo
  2. check out kyle/fix-package-exports
  3. set up PATH for mac as directed in DEVELOP.md
  4. install wasm-pack
  5. run scripts/build.sh
  6. replace parquet-wasm directory in node_modules with the pkg directory after a successful build
  7. run the code that initially started this thread
kylebarron commented 2 months ago

I published 0.6.0 just now, and believe this to be resolved. Reopen or add a new comment if you find a bug.

llimllib commented 2 months ago

Thanks!

edit: just adding a link to the issue that got me here