pytorch / torcharrow

High performance model preprocessing library on PyTorch
https://pytorch.org/torcharrow/beta/index.html
BSD 3-Clause "New" or "Revised" License
645 stars 79 forks source link

array_constructor does not work for more than 4 arguments #320

Open YLGH opened 2 years ago

YLGH commented 2 years ago

my df consists of int_0 through int_12 for, I'm trying to turn these into an array of features, however

df["dense_features"] = functional.array_constructor( *[df[int_name] for int_name in DEFAULT_INT_NAMES] )

fails with

Traceback (most recent call last):
  File "/home/ylgh/torchrec/examples/torcharrow/dataloader.py", line 52, in <module>
    df = criteo_preproc(df)
  File "/home/ylgh/torchrec/examples/torcharrow/dataloader.py", line 35, in criteo_preproc
    df["dense_features"] = functional.array_constructor(
  File "/home/ylgh/anaconda3/envs/torchrec/lib/python3.9/site-packages/torcharrow/_functional.py", line 54, in dispatch
    return op(*args)
  File "/home/ylgh/anaconda3/envs/torchrec/lib/python3.9/site-packages/torcharrow/velox_rt/functional.py", line 39, in dispatch
    result_col = ta.generic_udf_dispatch(op_name, *wrapped_args)
TypeError: generic_udf_dispatch(): incompatible function arguments. The following argument types are supported:
    1. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn
    2. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn, arg2: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn
    3. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn, arg2: torcharrow._torcharrow.BaseColumn, arg3: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn
    4. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn, arg2: torcharrow._torcharrow.BaseColumn, arg3: torcharrow._torcharrow.BaseColumn, arg4: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn

Invoked with: 'array_constructor', <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f5844733270>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f585895c1b0>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f58590af7b0>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f58590afa30>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f58590afaf0>

Based on this, it seems that it only supports up to 4 args. When I restrict it down to this DEFAULT_INT_NAMES[:4], it works

wenleix commented 2 years ago

Yeah, today it's adhoc supported based on arity: https://github.com/facebookresearch/torcharrow/pull/296

I think we need a variadic version for generic UDF call. In your case, it has 12 parameters?