ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

Placement groups behavior with XGBoost + Tune? #101

Open richardliaw opened 3 years ago

richardliaw commented 3 years ago
    ray_params = xgbr.RayParams(
        max_actor_restarts=1, gpus_per_actor=0, cpus_per_actor=2, num_actors=4
    )

    def tune_xgb(config, ray_params=None, train_set=None, test_set=None):
        evals_result = {}
        bst = xgbr.train(
            {
                "objective": "multi:softmax",
                "eval_metric": ["mlogloss", "merror"],
                "num_boost_round": 100,
                "num_class": num_classes,
            },
            train_set,
            evals_result=evals_result,
            evals=[(train_set, "train"), (test_set, "eval")],
            verbose_eval=False,
            ray_params=ray_params,
        )
        model_path = "tuned.xgb"
        bst.save_model(model_path)

    analysis = tune.run(
        tune.with_parameters(
            tune_xgb, train_set=train_set_, test_set=test_set_, ray_params=ray_params
        ),
        # Use the `get_tune_resources` helper function to set the resources.
        resources_per_trial=ray_params.get_tune_resources(),

This is showing me:

======== Autoscaler status: 2021-05-13 11:35:22.452073 ========
Node status
---------------------------------------------------------------
Healthy:
 1 anyscale.cpu.medium
Pending:
 (no pending nodes)
Recent failures:
 (no failures)
Resources
---------------------------------------------------------------
Usage:
 9.0/16.0 CPU
 0.0/2.0 CPU_group_0_fced8b2aee3ab0a7011db75557bcffcb
 0.0/2.0 CPU_group_1_fced8b2aee3ab0a7011db75557bcffcb
 0.0/2.0 CPU_group_2_fced8b2aee3ab0a7011db75557bcffcb
 0.0/2.0 CPU_group_3_fced8b2aee3ab0a7011db75557bcffcb
 8.0/8.0 CPU_group_fced8b2aee3ab0a7011db75557bcffcb
 0.00/1929912840.039 GiB memory
 526792.24/964956419.971 GiB object_store_memory
Demands:
 (no resource demands)

cc @amogkam

richardliaw commented 3 years ago

Maybe closed with #102