Open farridav opened 10 months ago
For the benefit of others, ive managed to solve this problem with the following implementation:
class MyXGBoostTrainer(XGBoostTrainer):
def _save_model(self, model: xgboost.Booster, path: str) -> None:
model.save_model(path + ".ubj")
Then using that instead, ill leave this ticket open, in case there is a cleaner, more config driven approach to this, thanks
Hey @farridav , can you share which version of Ray you are using?
If you are using the latest version, I believe it should be saving it with the .json
prefix already.
I'm currently pinned to 2.4, though I'll see if my vendor can help us get upgraded.
Even when we do though, I imagine the same difficulty when trying to utilise the .ubj format for model saving.
Are there any plans to make this property configurable? We also hit the same constraints within the Batch predictor.
Thanks for looking into this
Hm given that UBJ is now the default for XGBoost, would it be satisfactory if we just updated the checkpoint to .ubj
, or is there still a need for configurability?
Also as an FYI the BatchPredictor
interface is also now deprecated, see https://github.com/ray-project/ray/issues/37489. The new recommended pattern should allow you the flexibility to define how you load the checkpoint with your own custom behavior.
Default ubj or JSON satisfies my use case, though I can't speak for other use cases.
Thanks for the heads up on BatchPredictor, I'll move towards that pattern.
Description
Im needing to use categorical encoding in my xgboost model, but when i do, the checkpoints that the trainer is saving, fail, as the model is not saved in json/ubjson format, heres what i get:
xgboost.core.XGBoostError: [18:26:58] ../src/tree/tree_model.cc:869: Check failed: !HasCategoricalSplit(): Please use JSON/UBJSON for saving models with categorical splits.
unfortunately, im not able to change the model name that is saved with the checkpoint, as it is hardcoded to
MODEL
withinray.air.constants.MODEL_KEY
https://github.com/ray-project/ray/blob/master/python/ray/air/constants.py#L5C6-L5C6is there a way for me to save the model checkpoints with a json extension ? or overide this somehow ?
Heres an excerpt from my implementation:
Use case
Training an XGBoost model, that uses categorically encoded data, saving it, then running batch predictions from it in a seperate step