uwdata / arquero

Query processing and transformation of array-backed data tables.
https://idl.uw.edu/arquero
BSD 3-Clause "New" or "Revised" License
1.22k stars 64 forks source link

Failing during production build: minification problem? #323

Closed danyel-motif closed 1 year ago

danyel-motif commented 1 year ago

Arquero has been working wonderfully for our prototypes, and in development. Today, we tried a production build, and things collapsed.

The relevant code is pretty straightforward:

  const relevantRows = table
    .params({ chosenFragment })
    .filter((r: Struct, $: Struct) => r.fragment === $.chosenFragment);

It runs without warning in development. In production, however, it's a mess:

Error: Invalid member expression: "a[l.column]"
    at hs (main.js:4553:3128985)
    at Object.error (main.js:4553:3436135)
    at zAt (main.js:4553:3433306)
    at MemberExpression (main.js:4553:3433035)
(etc)

I suspect that we're getting an interaction between the arquero expression rewriter, and webpack minification. Any guesses what might be going on?

danyel-motif commented 1 year ago

Sadly,

  const relevantRows = table
    .params({ chosenFragment })
    .filter((r: Struct, $: Struct) => op.equal(r.fragment, $.chosenFragment));

is running into the same issue

danyel-motif commented 1 year ago

As a workaround,

  const relevantRows = table
    .params({ chosenFragment })
    .filter(
      aq.escape((r: Struct, $: Struct) =>
        op.equal(r.fragment, $.chosenFragment)
      )
    );

turns out to work fine, as does

  const relevantRows = table.filter(
    aq.escape((r: Struct) => r.fragment === chosenFragment)
  );

While this gets me through the workaround (yay!), it doesn't solve the underlying problem, which is that production build works inconsistently from development build.

I'd love any advice.

danyel-motif commented 1 year ago

... and I found a blocker, as I gradually rewrote:

The original, working code was this:

  const maxTime = df
    .params({ column })
    .rollup({
      highValue: (row: Struct, $: Struct) => aq.op.max(row[$.column]),
    })
    .get('highValue', 0);

That gets the error message Error: Invalid member expression: "a[l.column]"

When I change it to

  const maxTime = df
    //    .params({ column })
    .rollup({
      highValue: (row: Struct) => aq.escape(aq.op.max(row[column])),
    })
    .get('highValue', 0);

I get the message Error: Invalid function call: "Cy(rE.max(a[e]))" Note: Table expressions do not support closures. Use aq.escape(fn) to use a function as-is (including closures), or use aq.addFunction(name, fn) to add new op functions.

Last, this variant simply does not compile

  const maxTime = df
    //    .params({ column })
    .rollup({ highValue: aq.escape((row: Struct) => aq.op.max(row[column]) }))
    .get('highValue', 0);

This gets Error: Escaped functions are not valid as rollup or pivot values.

jheer commented 1 year ago

I will have to dig in more when I have time. I don't know yet it the issue is fundamentally with Arquero or with the setup/assumptions of your downstream build process. It may be that non-escaped Arquero functions (which are actually converted to code strings at runtime, internally parsed, and then used to generate optimized internal functions) are getting mangled by minification in a breaking way. FWIW, this is the first I've heard of this issue and I have not run into it using minifiers like terser (e.g., as part of a standalone rollup build).

If you have pointers or code for a minimal reproduction within a webpack build process that could greatly speed up diagnosis on my end.

FWIW, this version is not supposed to work, as escape should be called to wrap a function, not invoked as a subroutine within a function:

  const maxTime = df
    //    .params({ column })
    .rollup({
      highValue: (row: Struct) => aq.escape(aq.op.max(row[column])),
    })
    .get('highValue', 0);

Similarly, the way Arquero rewriting works, escaped functions can not at present be used in aggregation contexts.

Escaped functions will also use less optimized data access paths and so will run slower and (all else being equal) should not be used unless a closure context is really needed.

danyel-motif commented 1 year ago

Definitely agree that we don't want to be using escaping. The "rewrite everything with escaping" is the workaround from the previous model, which was "write everything without escaping." I'll see what I can do to get together a minimal repro

danyel-motif commented 1 year ago

Ok, I think I have a minimal repro here.

https://github.com/motifanalytics/arquero-minimal-issue-repro

You can test it with nx serve and nx serve --prod

The interesting code is in https://github.com/motifanalytics/arquero-minimal-issue-repro/blob/main/src/app/arq/arq.component.ts

When I run in dev, I get succeed; succeed; succeed; error When I run in prod, I get error; succeed; error; errror

jheer commented 1 year ago

Confirming that I can reproduce. Will take a closer look soon.

danyel-motif commented 1 year ago

Worth noting that -- with generous escaping and some careful use of constants in places that were once variables -- this is not blocking forward progress for us. But I'd still love to know.

jheer commented 1 year ago

This was indeed an Arquero bug. We generate and evaluate code (AST subtrees) as needed to turn parameter values into concrete keys for member property lookup. The parser was not doing "due diligence" on processing these subtrees prior to code generation, leading to the errors you've been seeing. We now traverse these subtrees and perform rewriting as we should.

danyel-motif commented 1 year ago

Glad to say we've been able to merge this into prod and it looks good! Thanks for the amazing response.