kylebarron / arrow-wasm

Building block library for using Apache Arrow in Rust WebAssembly modules.
http://kylebarron.dev/arrow-wasm/
Apache License 2.0
15 stars 3 forks source link

Modular Rust Arrow libraries in wasm-bindgen #8

Open kylebarron opened 1 year ago

kylebarron commented 1 year ago

Problem statement

The biggest hurdle with WebAssembly in the browser is that multiple Wasm modules can't share the same memory space. This means that having e.g. parquet-wasm and geoarrow-wasm as two separate NPM modules is annoying! You have to use parquet-wasm to load parquet into Arrow in Wasm... but then copy the data to JS, and then copy it into the next wasm module to do more processing with it! This is slow, memory intensive, and not user friendly.

Solution

In https://github.com/domoritz/arrow-wasm, Dominik's goal appeared to be to see if Arrow in rust/wasm would be faster than Arrow in JS. But since working with raw buffers is pretty fast in JS, it's not surprising that Wasm overhead would outweigh any other speedups.

I think the potential of arrow-wasm instead is in being a foundational library for other wasm-bindgen libraries.

So I see various potential libraries:

Other libraries for other formats might make sense to add in the future. like geoarrow-flatgeobuf, which uses rust to parse flatgeobuf into geoarrow. Etc.

Drawbacks

cc @H-Plus-Time

H-Plus-Time commented 1 year ago

Yep, this is a good way the way to do things, though I'd add a slight proviso to the drawbacks:

I'd probably also do geoarrow-wasm-slim and geoarrow-wasm-full in the one npm package and manage stuff via exports (~40MB + 2x geoarrow-wasm's size - I reckon all the packages that produce {node,bundler,esm,esm2} bundles can be slimmed down by 25-50% with one straightforward and one less straightforward tweak).

kylebarron commented 1 year ago
  • It isn't possible unless using wrapper structs

What I was hoping was that I could re-export all the existing struct's methods, and just add one new method. That doesn't really seem possible without manually wrapping the wrapped-struct's methods one by one?

  • Provided there isn't some awful quirk in depending crates, it should at least be relatively boilerplate. The difficulty would probably come from complicated conditional flag combinations (maybe once you're at the level of geopolars or js-polars, the impact of, say, individua compressions at the arrow-wasm/parquet-wasm level are too small to bother with flags other than 'all compressions').

Yeah I agree. It should be straightforward, just annoying.

I reckon all the packages that produce {node,bundler,esm,esm2} bundles can be slimmed down by 25-50% with one straightforward and one less straightforward tweak).

If you have packaging recommendations I'm all ears 🙂 . esm2 was a "temporary" hack to get esm working in deck.gl I believe, or something like that. Because the esm export used syntax only available in some specific environment.