uwdata / arquero

Query processing and transformation of array-backed data tables.
https://idl.uw.edu/arquero/
BSD 3-Clause "New" or "Revised" License
1.32k stars 64 forks source link

[question] why is 'op.' needed to use JS functions #115

Closed ericemc3 closed 3 years ago

ericemc3 commented 3 years ago

This question is maybe already answered somewhere but i havn't found it yet.

I am preparing a tutorial and i'd like to explain that point (which is not a problem for me), in case i am asked about.

Why is it not possible to use: tb.filter(d => d.codgeo.substr(0,2) == '31') // => yields an error message Invalid function call: "d.codgeo.substr(0,2)"

and why is this syntax necessary, with op.: tb.filter(d => op.substring(d.codgeo, 0, 2) == '31')

note that: tb.filter(d => d.codgeo.substring(0, 2) == '31') does not yield an error (probably because op.substring() exists), but of course doesn't work

jheer commented 3 years ago

Great question. We should put this in the documentation, too, as I think plenty of others will also be asking this. Here is a first draft, let me know what you think...

Why are only op functions supported?

Any function that is callable within an Arquero table expression must be defined on the op object, either as a built-in function or added via the extensibility API. Why? Why can't one just use a function directly?

As described earlier, Arquero table expressions can look like normal JavaScript functions, but are treated specially: their source code is parsed and new custom functions are generated to process data. This process prevents the use of closures, such as referencing functions or values defined externally to the expression.

But why do we do this? Here are a few reasons:

Of course, one might wish to make different trade-offs. Arquero is designed to support common use cases while also being applicable to more complex production setups. This goal comes with the cost of more rigid management of functions. That said, Arquero can be extended with custom variables, functions, and even new table methods or verbs! As starting points, see the params, addFunction, and addTableMethod functions to introduce external variables, register new op functions, or extend tables with new methods.

ericemc3 commented 3 years ago

Thank you very much for these insights, that i find very enlightening. Arrow support, safety and performance are 3 key criterias, whose importance is easy to demonstrate. I have now enough material to synthetize and include into my presentation!

I consider Arquero as a huge step forward for rich and efficient web open dataflows, and also a great boost for Arrow, D3, Vega-Lite-API and datavisualization in general. Thank you once again for your work, and also for helping, with Mike Bostock, Arquero and Observable to work well together.

I have other (falsely naive) questions, i should probably ask them somewhere else. Anyway, here they are:

jheer commented 3 years ago
  • What was Arquero originally designed for, leverage Arrow capabilities with JS, extend Vega data-transformation features, other motivations?

It began as a side project for fun during my academic sabbatical. Then it kind of steam-rolled into a full-fledged library. The goal was to build a more performant and adaptable JS query tool that extends what Vega can do and make it available outside of Vega specifications. I wanted my students and others working with Vega or D3 to be able to prepare/transform data comprehensively without having to move between different environments. The original focus was to support standard JS data structures first and foremost. Only later did I seek to push the API further by also providing direct Arrow support.

  • What is the team behind Arquero, you only, other people?

It is primarily just me for the core library. @chanwutk and @suikac have been working on arquero-sql.

  • What kind of feedback are you expecting from Arquero users?

All the standard stuff: feature requests, bug reports, documentation feedback, etc. I'd also love to hear from anyone using the library to learn more about what they are using it for.

ericemc3 commented 3 years ago

thank you for these additions.