Open YLGH opened 2 years ago
my df consists of int_0 through int_12 for, I'm trying to turn these into an array of features, however
df["dense_features"] = functional.array_constructor( *[df[int_name] for int_name in DEFAULT_INT_NAMES] )
fails with
Traceback (most recent call last): File "/home/ylgh/torchrec/examples/torcharrow/dataloader.py", line 52, in <module> df = criteo_preproc(df) File "/home/ylgh/torchrec/examples/torcharrow/dataloader.py", line 35, in criteo_preproc df["dense_features"] = functional.array_constructor( File "/home/ylgh/anaconda3/envs/torchrec/lib/python3.9/site-packages/torcharrow/_functional.py", line 54, in dispatch return op(*args) File "/home/ylgh/anaconda3/envs/torchrec/lib/python3.9/site-packages/torcharrow/velox_rt/functional.py", line 39, in dispatch result_col = ta.generic_udf_dispatch(op_name, *wrapped_args) TypeError: generic_udf_dispatch(): incompatible function arguments. The following argument types are supported: 1. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn 2. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn, arg2: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn 3. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn, arg2: torcharrow._torcharrow.BaseColumn, arg3: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn 4. (arg0: str, arg1: torcharrow._torcharrow.BaseColumn, arg2: torcharrow._torcharrow.BaseColumn, arg3: torcharrow._torcharrow.BaseColumn, arg4: torcharrow._torcharrow.BaseColumn) -> torcharrow._torcharrow.BaseColumn Invoked with: 'array_constructor', <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f5844733270>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f585895c1b0>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f58590af7b0>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f58590afa30>, <torcharrow._torcharrow.SimpleColumnREAL object at 0x7f58590afaf0>
Based on this, it seems that it only supports up to 4 args. When I restrict it down to this DEFAULT_INT_NAMES[:4], it works
Yeah, today it's adhoc supported based on arity: https://github.com/facebookresearch/torcharrow/pull/296
I think we need a variadic version for generic UDF call. In your case, it has 12 parameters?
my df consists of int_0 through int_12 for, I'm trying to turn these into an array of features, however
df["dense_features"] = functional.array_constructor( *[df[int_name] for int_name in DEFAULT_INT_NAMES] )
fails with
Based on this, it seems that it only supports up to 4 args. When I restrict it down to this DEFAULT_INT_NAMES[:4], it works