Right now the array feature selection only allows combining exactly two input columns into an output column. To make this more flexible, we could support passing any number of columns, with a minimum of 1. This should be a small change in hlink/linking/core/transforms.py, where we unpack feature_selection["input_columns"] with
col1, col2 = feature_selection["input_columns"]
The pyspark.sql.functions.array() function which we're using accepts a variable number of arguments.
Right now the
array
feature selection only allows combining exactly two input columns into an output column. To make this more flexible, we could support passing any number of columns, with a minimum of 1. This should be a small change in hlink/linking/core/transforms.py, where we unpackfeature_selection["input_columns"]
withThe
pyspark.sql.functions.array()
function which we're using accepts a variable number of arguments.