jrossthomson / bigquery-utils

Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
https://cloud.google.com/bigquery/
Apache License 2.0
0 stars 0 forks source link

Feat jstat udf #8

Closed imathews closed 3 years ago

imathews commented 3 years ago

@jrossthomson @boaguilar I just did a little bit of experimentation to wrap the jstat library in a single UDF — at first pass, it seems that this unfortunately won't be possible for all jStat methods, but maybe achievable for those that take a list of floats as their arguments and return a single scalar float.

My naive attempt was to do something like below, where args is an array of arguments to pass to the jstat function:

CREATE OR REPLACE FUNCTION fn.jstat(method STRING, args ANY TYPE)
RETURNS ANY TYPE
LANGUAGE js AS """
    const methodPath = method.split('.')
    let fn = jstat['jStat']
    for (const name of methodPath){
        fn = fn[name]
    }

  return fn(...args)
"""
OPTIONS (
        library=["${JS_BUCKET}/jstat-v1.9.4.min.js"]
);

You would invoke the method like so:

SELECT fns.jstat('chisquare.cdf',  ARRAY<FLOAT64>[0.3, 2.0])

However, the ANY_TYPE parameter is currently only supported in SQL UDFs, not JS UDFs. Moreover, ANY_TYPE isn't supported as the return type for UDFs.

Many of the jstat functions expect floats as their input and return a single float scalar, so we could implement this for just those cases, which is what I've done here. Not sure if we want to merge this, or maybe others have an idea of a better approach.

The function implemented in this PR is as below:

CREATE OR REPLACE FUNCTION fn.jstat(method STRING, args ARRAY<FLOAT64>)
RETURNS FLOAT64
LANGUAGE js AS """
    const methodPath = method['split']('.')
    let fn = jstat['jStat']
    for (const name of methodPath){
        fn = fn[name]
    }

  return fn(...args)
"""
OPTIONS (
        library=["${JS_BUCKET}/jstat-v1.9.4.min.js"]
);