uwdata / flechette

Fast, lightweight access to Apache Arrow data.
https://idl.uw.edu/flechette/
Other
52 stars 3 forks source link

Bundling Mosaic 0.11 with Referenced Flechette and DuckDB-WASM dependencies - Missing Export Replacing Apache-Arrow #24

Closed Unemyr closed 1 month ago

Unemyr commented 1 month ago

Hi,

First of all thanks again for yet another really interesting project! Mosaic 0.11 has a number of significant improvements so that is very encouraging.

It is noted that Flechette is still missing some of the exports that apache-arrow has, that is normally used by DuckDB-WASM. When I bundle Mosaic 0.11 in my project (I'm using rollup, just FYI), I get warning messages that duckdb-wasm was referencing the following API in apache-arrow that Flechette does not export. I'm not sure if it require a full implementation or a dummy export is sufficient for now - Mosaic 0.11 seem to work fine as far as I can gather for now.

Here are the details: exporter: 'node_modules/@uwdata/flechette/src/index.js', id: 'node_modules/@duckdb/duckdb-wasm/dist/duckdb-browser.mjs', message: '"RecordBatchReader" is not exported by "node_modules/@uwdata/flechette/src/index.js", imported by "node_modules/@duckdb/duckdb-wasm/dist/duckdb-browser.mjs".', url: 'https://rollupjs.org/troubleshooting/#error-name-is-not-exported-by-module',

In the rollup bundler I'm using @rollup/plugin-alias to replace 'apache-arrow' with '@uwdata/flechette', as duckdb-wasm 1.28.1-dev278.0 will otherwise reference apache-arrow as normal instead of flechette.

Running 'yarn why apache-arrow' gives: Reasons this module exists

It would be great if this export can be added to future versions of Flechette to avoid developer confusion and/or integration issues.

With kind regards,

Erik

Unemyr commented 1 month ago

Digging into the code a bit more, the RecordBatchReader is used in the DuckDB-WASM query() function which is quite commonly used, so it could be difficult to replace apache-arrow for DuckDB-WASM until this is implemented. Feel free to give any tips if I may have overlooked something.

https://github.com/duckdb/duckdb-wasm/blob/60eadb4f483fc2ada1ecb2b2c597b975c4ac245e/packages/duckdb-wasm/src/bindings/connection.ts#L31: /* Run a query / public query<T extends { [key: string]: arrow.DataType } = any>(text: string): arrow.Table { const buffer = this._bindings.runQuery(this._conn, text); const reader = arrow.RecordBatchReader.from(buffer);

jheer commented 1 month ago

Hi @Unemyr, the main issue appears to be your use of @rollup/plugin-alias. Flechette is not intended to function as an alias for apache-arrow JS. Meanwhile, Mosaic no longer calls DuckDB-WASM's query method. Instead it sidesteps that method (and all use of apache-arrow) to get access to the "raw" Arrow IPC bytes and then use Flechette for decoding. I suspect everything will bundle correctly if you remove the alias.

Next, you will then likely still be stuck with the extra unneeded "bloat" of DuckDB-WASM's apache-arrow dependency. If you can configure your build process to simply exclude apache-arrow from your bundle, you should be able to get a working build minus the bloat.

Unemyr commented 1 month ago

Thanks @jheer for the clarification. Yes you are right, based on other functionality I have outside of Mosaic I still have the query() call so for the moment I would need to use both apache-arrow + flechette. I'll try to work around it and see if I can get away without using query() in other sections of code to drop the dependency.

Cheers,

Erik