Open zcarrico-fn opened 10 months ago
Far from an expert on the Ray library, but perhaps it's possible to use Ray Tune, with the hyperparameter specifying which i'th fold to use?
i.e. All same data is send to each tune run, but with each run sorts/randomizes data exactly the same (each same random order is required inside each tune run so as to have each sample be part of the val split exactly once), and have the ray tune param_space identifying which i'th split to use as val split for each run
Thank you @Anner-deJong , and this works with customizable training functions, but ray_xgboost isn't customizable beyond argument input as far as I know. In other words, the fold values could be passed as a hyperparam, but there's no way to pass a custom callable to ray_xgboost to have xgboost_ray select data based on the fold value hyperparameter.
For cross-validation, we are attempting to parallelize xgboost_ray.train using ray.remote tasks. Each remote task uses a different cross-validation split of the data. Unfortunately, parallelizing xgboost_ray.train results in the below errors. If the same tasks are run sequentially, rather than in parallel, no errors occur. Below is a reproducible example based on xgboost_ray's documentation's example. If this is run locally, it completes. It's only when parallelized on a remote Ray cluster that it results in the below errors.
Traceback:
I expect the same error will be encountered when parallelizing remote tasks for nested cross-validation for HPO.
Please let me know if you have any questions and thank you for the help and the great xgboost_ray library!