uwdata / arquero

Query processing and transformation of array-backed data tables.
https://idl.uw.edu/arquero/
BSD 3-Clause "New" or "Revised" License
1.29k stars 63 forks source link

derive can not handle string? #328

Closed winner106 closed 1 year ago

winner106 commented 1 year ago
name style brewery_id abv ibu test
#002 American I.P.A. American IPA 211 0.071 60
#004 Session I.P.A. American IPA 211 0.048 38
#9 Fruit / Vegetable Beer 303 0.051 20
077XX American Double / Imperial IPA 222 0.078 80
113 IPA American IPA 371 0.070 113
12th Round American Strong Ale 376 0.076 78
13 Rebels ESB Extra Special / Strong Bitter (ESB) 433 0.052 42
1327 Pod's ESB Extra Special / Strong Bitter (ESB) 380 0.056 37
14° ESB Extra Special / Strong Bitter (ESB) 75 0.056 32
1554 Black Lager Euro Dark Lager 82 0.056 21
beers.derive({test: (d) => d['style'].split(' ')[0]})

can not add new text column

jcmkk3 commented 1 year ago

Try to use the op functions instead of methods.

beers.derive({test: (d) => op.split(d.style, '  ')[0]})

https://uwdata.github.io/arquero/api/op#split

jheer commented 1 year ago

@jcmkk3 is right. As described in the documentation, by default Arquero will rewrite functions to optimize data access paths. To make this safe and predictable, arbitrary object-bound methods can not be used.

To sidestep this within a derive (and also allow access to closure variables), you can preserve a normal function by using aq.escape(func) to wrap the function. However, this will opt out of any optimization, potentially causing reduced performance. But if your data is not particularly large you probably won't notice an issue.