unytics / bigfunctions

Supercharge BigQuery with BigFunctions
https://unytics.io/bigfunctions/
MIT License
599 stars 55 forks source link

Deploy new min_max_scaler function #140

Closed julienawonga closed 5 months ago

julienawonga commented 5 months ago

This function performs min-max scaling on an array. It takes an array as input and returns the scaled values between 0 and 1

unytics commented 5 months ago

Thanks @julienawonga and @AxelThevenot for this great function. I merge it and I will deploy it.

julienawonga commented 5 months ago

Thanks @AxelThevenot , @unytics !

unytics commented 5 months ago

The function is deployed and works congrats 👏 @julienawonga and @AxelThevenot Here's the link in the documentation.

On the other hand, when I tried to deploy the function, I got some errors that I needed to fix. Then once deployed when I tested it, I got some other errors. Please next time, try to deploy the function on your side, test it, and solve errors before creating the pull request.

(I should definitely add automatic testing in a test environment in github actions to prevent that).

If you have difficulties in solving errors, don't hesitate to have a chat with me on slack

Errors were linked to the fact that bigquery do not support js functions with any type as input or output. image

I converted it into a sql function which now works but also, is much more efficient than js on BigQuery. Here is the new code.

AxelThevenot commented 5 months ago

My bad I made a review with the comment in the same time to ask for a SQL version instead of JS

but I forgot to click on the request change button so you thought it was ready as only the EOF comment has appeared

julienawonga commented 5 months ago

Thank you @unytics .
I try this first, and to avoid passing min, max as args, I just write Js code. Thanks a Lot!

WITH Data AS (
  SELECT val
  FROM UNNEST([2, 3, 5, 8]) AS val
),
MinMaxValues AS (
  SELECT
    MIN(val) AS min_val,
    MAX(val) AS max_val
  FROM Data
)

-- 
CREATE TEMP FUNCTION MinMaxScaler(x FLOAT64, min_val FLOAT64, max_val FLOAT64) 
RETURNS FLOAT64
AS (
  (x - min_val) / (max_val - min_val)
);

--
SELECT
  d.val,
  MinMaxScaler(d.val, mm.min_val, mm.max_val) AS Scaled
FROM
  Data d,
  MinMaxValues mm;
julienawonga commented 5 months ago

image

AxelThevenot commented 5 months ago

@julienawonga You can try to create the standard_scaler (if not yet exist)

select big functions.eu.standard_scaler(arr)

This time opening first an issue then deploying your function locally using the bigfun

It has to work before opening the pull request 🙏 Do not hesitate if you have questions or have some issues !

julienawonga commented 5 months ago

Thank you ! I'll work on it.