Open Xiuyu-Li opened 5 years ago
There are duplicate commits in this PR and it seems like this is due to merging upstream when doing the development. I am not sure how to solve this when merging this PR (use rebase?)
hi @lxywizard, sorry for attending to this PR late. Are you able to resolve the conflicts with dev
first?
There are duplicate commits in this PR and it seems like this is due to merging upstream when doing the development. I am not sure how to solve this when merging this PR (use rebase?)
The simplest way is as follows:
In the future, you can rebase your branch to the latest dev branch.
hi @lxywizard, tried to run your model XgbClf but failed. Error message: `2019-10-04 14:06:06,534 rafiki.utils.service INFO Starting worker "9b3218bda06b" for service of ID "a166b1cf-ad69-4666-be27-fdb763e51885"... 2019-10-04 14:06:07,193 rafiki.worker.train INFO Reading job info from meta store... 2019-10-04 14:06:07,202 rafiki.worker.train INFO Using model "XgbClf"... 2019-10-04 14:06:07,615 rafiki.redis.redis INFO Connecting to Redis at namespace TRAIN:157c0333-32b3-439e-86e2-87a984bdb241... 2019-10-04 14:06:07,615 rafiki.redis.redis INFO Connecting to Redis at namespace PARAMS:157c0333-32b3-439e-86e2-87a984bdb241... 2019-10-04 14:06:07,615 rafiki.worker.train INFO Starting worker for sub train job "157c0333-32b3-439e-86e2-87a984bdb241"... 2019-10-04 14:06:08,037 rafiki.worker.train INFO Starting trial 7859f769-b4e4-42ba-adc2-42a8741baf27 with proposal {'trial_no': 1, 'knobs': {'n_estimators': 122, 'min_child_weight': 5, 'max_depth': 3, 'gamma': 0.570998379009159, 'subsample': 0.5633938164447414, 'colsample_bytree': 0.5589119180136076}, 'params_type': 'NONE', 'to_eval': True, 'to_cache_params': False, 'to_save_params': True, 'meta': {'proposal_type': 'SEARCH'}, 'trial_id': '7859f769-b4e4-42ba-adc2-42a8741baf27'}... 2019-10-04 14:06:08,038 rafiki.worker.train INFO Marking trial as running in store... 2019-10-04 14:06:08,059 rafiki.worker.train INFO Creating model instance... 2019-10-04 14:06:08,066 rafiki.worker.train INFO Training model... 2019-10-04 14:06:16,468 rafiki.worker.train ERROR Error while running trial: 2019-10-04 14:06:16,477 rafiki.worker.train ERROR Traceback (most recent call last): File "/root/rafiki/worker/train.py", line 113, in _perform_trial self._train_model(model_inst, proposal, shared_params) File "/root/rafiki/worker/train.py", line 177, in _train_model model_inst.train(train_dataset_path, shared_params=shared_params, **(train_args or {})) File "/root/XgbClf-05ff1472-5ff7-4b69-a1cd-898366d48b02.py", line 72, in train File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/base.py", line 357, in score return accuracy_score(y, self.predict(X), sample_weight=sample_weight) File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 176, in accuracy_score y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 88, in _check_targets raise ValueError("{0} is not supported".format(y_type)) ValueError: continuous is not supported
2019-10-04 14:06:16,484 rafiki.worker.train INFO Marking trial as errored in store... 2019-10-04 14:06:16,503 rafiki.redis.train_cache INFO Creating result "{'proposal': {'trial_no': 1, 'knobs': {'n_estimators': 122, 'min_child_weight': 5, 'max_depth': 3, 'gamma': 0.570998379009159, 'subsample': 0.5633938164447414, 'colsample_bytree': 0.5589119180136076}, 'params_type': 'NONE', 'to_eval': True, 'to_cache_params': False, 'to_save_params': True, 'meta': {'proposal_type': 'SEARCH'}, 'trial_id': '7859f769-b4e4-42ba-adc2-42a8741baf27'}, 'score': None}" for worker "9b3218bda06b"... 2019-10-04 14:06:16,504 rafiki.redis.train_cache INFO Deleting existing proposal for worker "9b3218bda06b"... 2019-10-04 14:06:16,605 rafiki.worker.train INFO Starting trial f6214ca4-6222-4836-8575-2a73cf3fe36c with proposal {'trial_no': 2, 'knobs': {'n_estimators': 59, 'min_child_weight': 6, 'max_depth': 5, 'gamma': 0.039364911210974414, 'subsample': 0.8211606352017686, 'colsample_bytree': 0.37145979412867836}, 'params_type': 'NONE', 'to_eval': True, 'to_cache_params': False, 'to_save_params': True, 'meta': {'proposal_type': 'SEARCH'}, 'trial_id': 'f6214ca4-6222-4836-8575-2a73cf3fe36c'}... 2019-10-04 14:06:16,605 rafiki.worker.train INFO Marking trial as running in store... 2019-10-04 14:06:16,620 rafiki.worker.train INFO Creating model instance... 2019-10-04 14:06:16,626 rafiki.worker.train INFO Training model... 2019-10-04 14:06:19,728 rafiki.worker.train ERROR Error while running trial: 2019-10-04 14:06:19,734 rafiki.worker.train ERROR Traceback (most recent call last): File "/root/rafiki/worker/train.py", line 113, in _perform_trial self._train_model(model_inst, proposal, shared_params) File "/root/rafiki/worker/train.py", line 177, in _train_model model_inst.train(train_dataset_path, shared_params=shared_params, **(train_args or {})) File "/root/XgbClf-05ff1472-5ff7-4b69-a1cd-898366d48b02.py", line 72, in train File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/base.py", line 357, in score return accuracy_score(y, self.predict(X), sample_weight=sample_weight) File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 176, in accuracy_score y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 88, in _check_targets raise ValueError("{0} is not supported".format(y_type)) ValueError: continuous is not supported
2019-10-04 14:06:19,738 rafiki.worker.train INFO Marking trial as errored in store... 2019-10-04 14:06:19,748 rafiki.redis.train_cache INFO Creating result "{'proposal': {'trial_no': 2, 'knobs': {'n_estimators': 59, 'min_child_weight': 6, 'max_depth': 5, 'gamma': 0.039364911210974414, 'subsample': 0.8211606352017686, 'colsample_bytree': 0.37145979412867836}, 'params_type': 'NONE', 'to_eval': True, 'to_cache_params': False, 'to_save_params': True, 'meta': {'proposal_type': 'SEARCH'}, 'trial_id': 'f6214ca4-6222-4836-8575-2a73cf3fe36c'}, 'score': None}" for worker "9b3218bda06b"... 2019-10-04 14:06:19,748 rafiki.redis.train_cache INFO Deleting existing proposal for worker "9b3218bda06b"... 2019-10-04 14:06:19,848 rafiki.worker.train INFO Starting trial aef0a1a3-33cb-44f3-ba02-6606aad1afe4 with proposal {'trial_no': 3, 'knobs': {'n_estimators': 156, 'min_child_weight': 5, 'max_depth': 3, 'gamma': 0.1988870869712323, 'subsample': 0.974919369835241, 'colsample_bytree': 0.6632012923087245}, 'params_type': 'NONE', 'to_eval': True, 'to_cache_params': False, 'to_save_params': True, 'meta': {'proposal_type': 'SEARCH'}, 'trial_id': 'aef0a1a3-33cb-44f3-ba02-6606aad1afe4'}... 2019-10-04 14:06:19,849 rafiki.worker.train INFO Marking trial as running in store... 2019-10-04 14:06:19,863 rafiki.worker.train INFO Creating model instance... 2019-10-04 14:06:19,870 rafiki.worker.train INFO Training model... 2019-10-04 14:06:28,211 rafiki.worker.train ERROR Error while running trial: 2019-10-04 14:06:28,217 rafiki.worker.train ERROR Traceback (most recent call last): File "/root/rafiki/worker/train.py", line 113, in _perform_trial self._train_model(model_inst, proposal, shared_params) File "/root/rafiki/worker/train.py", line 177, in _train_model model_inst.train(train_dataset_path, shared_params=shared_params, **(train_args or {})) File "/root/XgbClf-05ff1472-5ff7-4b69-a1cd-898366d48b02.py", line 72, in train File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/base.py", line 357, in score return accuracy_score(y, self.predict(X), sample_weight=sample_weight) File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 176, in accuracy_score y_type, y_true, y_pred = _check_targets(y_true, y_pred) File "/usr/local/envs/rafiki/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 88, in _check_targets raise ValueError("{0} is not supported".format(y_type)) ValueError: continuous is not supported`.
Please check.
@pinpom It seems like this issue is caused by trying to run classification models on dataset used for regression tasks. Can you try again with other dataset like titanic? Or can you specify which dataset you used when you had this issue?
@pinpom It seems like this issue is caused by trying to run classification models on dataset used for regression tasks. Can you try again with other dataset like titanic? Or can you specify which dataset you used when you had this issue?
@lxywizard I used titanic dataset. for your reference, below are details of my train job:
client.get_train_jobs_of_app(app='titanic_app') [{'app': 'titanic_app', 'app_version': 1, 'budget': {'GPU_COUNT': 0, 'MODEL_TRIAL_COUNT': 3, 'TIME_HOURS': 0.1}, 'datetime_started': 'Fri, 04 Oct 2019 14:05:49 GMT', 'datetime_stopped': 'Fri, 04 Oct 2019 14:06:32 GMT', 'id': 'ae450d32-b6e5-4d6c-ad89-3eb42ea58ed7', 'status': 'STOPPED', 'task': 'TABULAR_CLASSIFICATION', 'train_args': None, 'train_dataset_id': '046b5c27-9896-4fbb-8442-fde092a0d3f3', 'val_dataset_id': 'efdbc98f-cb59-40ee-8b9a-511b3696bc6a'}]
@pinpom It seems like you did not put anything into the train_args
when initializing the training. Try using train_args={'model_selector': 'oboe', 'features': ['Pclass', 'Sex', 'Age'], 'target':'Survived'}
and see if this error still occurs.
I also provide you with a sample to do the testing.
@lxywizard oh yes, sorry my mistake. Thanks for pointing it out. I managed to run it successfully.
@pinpom No worries. Let me know if there are any other issues or changes you want me to make.
Integrate OBOE for
TABULAR_CLASSIFICATION
task'model_selector': 'oboe'
in thetrain_args
to use OBOE when creating aTABULAR_CLASSIFICATION
train job.In
rafiki/advisor/oboe
folder:automl/defaults
folder contains the configurationclassification.json
for all current RafikiTABULAR_CLASSIFICATION
models and necessary matrix used by OBOE. It can be applied for any set of current RafikiTABULAR_CLASSIFICATION
models.TABULAR_CLASSIFICATION
model (with class namenew_model
) for Rafiki and applies OBOE, the matrix and config need to be updated. The model should be imported on the top ofautoml/util
asnew_model
andautoml/defaults/classification.json
should be edited with the new model and the selected hyperparameters config.error_matrix_generation/dataset
.error_matrix_generation/start_matrix_generation.sh
script will update the OBOE automatically.Some possible changes to be made:
rafikiai/rafiki_admin
docker images. The workflow may be more logical if developing a pipeline for model selection inrafikiai/rafiki_worker
.TABULAR_REGRESSION
to be addedThe license and credits to the original OBOE repository and documentations will be added later.