ray-project / xgboost_ray

Distributed XGBoost on Ray
Apache License 2.0
143 stars 34 forks source link

"grpc_message":"Received message larger than max #201

Closed cgy-dayup closed 2 years ago

cgy-dayup commented 2 years ago

_InactiveRpcError Traceback (most recent call last)

in 47 num_samples=1, 48 scheduler=ASHAScheduler(max_t=200), ---> 49 resources_per_trial=ray_params.get_tune_resources()) 50 print("Best hyperparameters", analysis.best_config) ..... /usr/local/lib/python3.6/dist-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline) 847 return state.response 848 else: --> 849 raise _InactiveRpcError(state) 850 851 _InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.RESOURCE_EXHAUSTED details = "Received message larger than max (210771146 vs. 104857600)" debug_error_string = "{"created":"@1646188124.309444695","description":"Error received from peer ipv4:10.207.183.32:40455","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Received message larger than max (210771146 vs. 104857600)","grpc_status":8}"
cgy-dayup commented 2 years ago

when i want to test the demo using my own dataset,i come across the problem?

Yard1 commented 2 years ago

Are you loading the dataset in a distributed fashion? Can you show the entire script?

cgy-dayup commented 2 years ago

Are you loading the dataset in a distributed fashion? Can you show the entire script?

no,i am not.i load the csv from local.Below is the full script:

y_train=df_train['gena_sale_qtty']
y_test=df_dev['gena_sale_qtty'] 
train_data = lgb.Dataset(x_train, label=y_train)
test_data = lgb.Dataset(x_test, label=y_test)
cat_feature=['big_sale_flag','is_weekday','day_week','week_num',
                                  'mon_num','day_mon','region']
def traindata(config):
    gbm = lgb.train(
        config,
        train_set=train_data,
        valid_sets=[test_data],
        valid_names=["eval"],
        verbose_eval=False,
        categorical_feature=cat_feature
         callbacks=[
             TuneReportCheckpointCallback({
                 "l1": "eval-l1"
             })
         ])
    y_pre=gbm.predict(x_test)
    res=sklearn.metrics.mean_absolute_error(y_pre,y_test)
    tune.report(mean_accuracy=res, done=True)
if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--server-address",
        type=str,
        default=None,
        required=False,
        help="The address of server to connect to if using "
        "Ray Client.")
    args, _ = parser.parse_known_args()
    if args.server_address:
        import ray
        ray.init(f"ray://{args.server_address}")
    config = {
        "boosting_type": "gbdt",
        'objective': 'regression',
        "metric": "l1",
        "verbose": -1,
        "learning_rate": tune.grid_search([1e-4, 1e-3, 1e-2]),
        "num_boost_round": tune.grid_search([300, 500, 1000]),
        "colsample_bytree": tune.grid_search([0.7, 0.8,1]),
        "max_depth": tune.grid_search([5, 6, 7]),
    }

    analysis = tune.run(
        traindata,
        metric="l1",
        mode="min",
        config=config,
         num_samples=10,
         scheduler=ASHAScheduler(max_t=200)
        )
    print("Best hyperparameters found were: ", analysis.best_config)
Yard1 commented 2 years ago

In general, you should not be using non-local variables with Ray, and for Tune, data should be passed through tune.with_parameters. Try this:

y_train=df_train['gena_sale_qtty']
y_test=df_dev['gena_sale_qtty'] 
cat_feature=['big_sale_flag','is_weekday','day_week','week_num',
                                  'mon_num','day_mon','region']
def traindata(config, x_train, x_test, y_train, y_test):
    train_data = lgb.Dataset(x_train, label=y_train)
    test_data = lgb.Dataset(x_test, label=y_test)
    gbm = lgb.train(
        config,
        train_set=train_data,
        valid_sets=[test_data],
        valid_names=["eval"],
        verbose_eval=False,
        categorical_feature=cat_feature
         callbacks=[
             TuneReportCheckpointCallback({
                 "l1": "eval-l1"
             })
         ])
    y_pre=gbm.predict(x_test)
    res=sklearn.metrics.mean_absolute_error(y_pre,y_test)
    tune.report(mean_accuracy=res, done=True)

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--server-address",
        type=str,
        default=None,
        required=False,
        help="The address of server to connect to if using "
        "Ray Client.")
    args, _ = parser.parse_known_args()
    if args.server_address:
        import ray
        ray.init(f"ray://{args.server_address}")
    config = {
        "boosting_type": "gbdt",
        'objective': 'regression',
        "metric": "l1",
        "verbose": -1,
        "learning_rate": tune.grid_search([1e-4, 1e-3, 1e-2]),
        "num_boost_round": tune.grid_search([300, 500, 1000]),
        "colsample_bytree": tune.grid_search([0.7, 0.8,1]),
        "max_depth": tune.grid_search([5, 6, 7]),
    }

    analysis = tune.run(
        tune.with_parameters(traindata, x_train=x_train, x_test=x_test, y_train=y_train, y_test=y_test),
        metric="l1",
        mode="min",
        config=config,
         num_samples=10,
         scheduler=ASHAScheduler(max_t=200)
        )
    print("Best hyperparameters found were: ", analysis.best_config)

If this doesn't work, consider using LightGBM-Ray (you are using regular LightGBM) and pass data through a Ray Dataset.

cgy-dayup commented 2 years ago

thank you! i solved the problem after using your method.And i also have a question: what's the difference between 'tune.report' and 'TuneReportCheckpointCallback'?i find the ‘ mean_accuracy‘ reported by ’tune.report‘ is not reflected anywhere.