Cannot reproduce Flaml predictions using SkLearn RF

Discussed in https://github.com/microsoft/FLAML/discussions/1054

^{Originally posted by **Therrm** May 26, 2023} Hi there! After running Flaml on RF only, I get the following best parameters: `best_hyperparams={"subsample": 1.0, "num_leaves": 256, "n_estimators": 300, "min_split_gain": 0.0, "min_child_samples": 30, "max_depth": -1, "learning_rate": 0.01, "colsample_bytree": 1}` But when I try to reproduce those predictions with the same parameters using sklearn rf , I get quite different results. For instance, I get only 3 to 4 different predictions while those from Flaml were close to a random distribution. What else Flaml does that the RF doesn't? Is there some additional post-processing done by Flaml? Note: I already pre-process my data by removing rows with empty data and normalizing the dataset (for both for Flaml and RF). Thanks

I have the same issue. I use sklearn pipeline with flaml and then reproduce with sklearn pipeline. The results are totally different. Not only rf, but also for k neighbor (without random seed effect). automl_pipeline = Pipeline([ ("standardizer", standardizer), ("automl", automl) ]) automl_settings = { "time_budget": 240, "estimator_list": ['kneighbor'], #rf "eval_method": 'cv', "split_type": 'stratified', "n_splits": 5, "metric": 'accuracy', "task": 'classification', "log_file_name": "data.log", "seed": 42, "verbose":5 }

microsoft / FLAML

Cannot reproduce Flaml predictions using SkLearn RF #1287

Discussed in https://github.com/microsoft/FLAML/discussions/1054