I am testing the latest commit of this repo with the criteo dataset. There is a self-defined preprocessor "convert_to_bucket"。 This preprocessor need a column name as its argument. I got error in this piece of code:
1. def preprocess(self, ddf):
2. logging.info("Preprocess feature columns...")
3. all_cols = self.label_cols + self.feature_cols[::-1]
4. print(all_cols)
5. for col in all_cols:
6. name = col["name"]
7. #if name in ddf.columns:
8. if name in ddf.collect_schema().names():
9. print(name)
10. fill_na = "" if col["dtype"] in ["str", str] else 0
11. fill_na = col.get("fill_na", fill_na)
12. ddf = ddf.with_columns(pl.col(name).fill_null(fill_na))
13. if col.get("preprocess"):
14. preprocess_args = re.split(r"\(|\)", col["preprocess"])
15. preprocess_fn = getattr(self, preprocess_args[0])
16. print(preprocess_args)
17. ddf = ddf.with_columns(
18. #preprocess_fn(*preprocess_args[1:-1])
19. preprocess_fn(name, *preprocess_args[1:-1])
20. .alias(name)
21. .cast(self.dtype_dict[name])
22. )
23. active_cols = [col["name"] for col in all_cols if col.get("active") != False]
24. ddf = ddf.select(active_cols)
25. return ddf
The error happened at line 18, where if no arguments are provided, _ddf.with_columns(preprocessfn().alias(name).cast(xxx) will throw exception. At least the column name should be given to this function. I have tested my change, it seems right. So does this mean, we should add the column name into the function arguments at least?
I am testing the latest commit of this repo with the criteo dataset. There is a self-defined preprocessor "convert_to_bucket"。 This preprocessor need a column name as its argument. I got error in this piece of code:
The error happened at line 18, where if no arguments are provided, _ddf.with_columns(preprocessfn().alias(name).cast(xxx) will throw exception. At least the column name should be given to this function. I have tested my change, it seems right. So does this mean, we should add the column name into the function arguments at least?