jqnatividad / qsv

CSVs sliced, diced & analyzed.
The Unlicense
2.26k stars 63 forks source link

`luau`: additional helper functions #1782

Open jqnatividad opened 1 month ago

jqnatividad commented 1 month ago

@ggrothendieck came up with an extensive list of helper functions to add to luau as qsv's DSL.


If you are implementing cumsum there are a number of other related functions that have proven to be useful in other languages which follow a similar pattern and so could be readily implemented at the same time.

cumprod, cumany, cumall, cummax, cummin These are like cumsum but in place of + they use *, or, and, max amd min. cummean is also useful but does not fit exactly into the same pattern.

accumulate Has three arguments. A column, a function of two arguments and an optional initial value. If   y = accum(x, f, init)then y[1] = init and for i >1 we have y[i] = f(y[i-1], x[i]). The default for init is x[1]. Note that if f is +, *, or, and, max or min we get the above cum... functions.

Other

The following are also useful and are related in so far as they also involve storing the previous value.

lag It has three arguments. The column, how many positions to lag (default is 1) and in the case that the lag is off the front of the column use default. If y = lag(x, k, default) then y[i] = x[i-k] if i > k and default otherwise. Negative k could be considered too if not too hard to implement. Recall we discussed enum by group with the shortest solution being the following where Name is the column to group by:

qsv luau map seq "x = (Name == prev and 1 or 0) * (x or 0) + 1; prev = Name; return x" file1.csv

With lag we could omit setting prev

qsv luau map seq "x = (Name == lag(Name, 1) and 1 or 0) * (x or 0) + 1; return x" file1.csv

diff Same args as lag. Defined as x - lag(x, k, default)

Originally posted by @ggrothendieck in https://github.com/jqnatividad/qsv/discussions/1760#discussioncomment-9262555