Open hadyan-tvlk opened 6 years ago
Can you provide the command you used to launch and what the JSON look like for the various runs in the ML Engine dashboard for that job?
Hi @rsepassi,
thanks for the response. I'm using exactly command that provided in tutorial
DATADIR=gs://${BUCKET}/poetry/data
OUTDIR=gs://${BUCKET}/poetry/model_hparam
JOBNAME=poetry_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR
echo "Y" | t2t-trainer \
--data_dir=gs://${BUCKET}/poetry/subset \
--t2t_usr_dir=./poetry/trainer \
--problem=$PROBLEM \
--model=transformer \
--hparams_set=transformer_poetry \
--output_dir=$OUTDIR \
--hparams_range=transformer_poetry_range \
--autotune_objective='metrics-poetry_line_problem/accuracy_per_sequence' \
--autotune_maximize \
--autotune_max_trials=4 \
--autotune_parallel_trials=4 \
--train_steps=7500 --cloud_mlengine --worker_gpu=4
The JSON logs is looks fine for each trial. nothing special. I'll give you the last lines of logs to show that there is not best score params yielded into the end of the logs
I had already ran it several times and still get nothing.
Ah, it won’t be in each job’s logs but rather in the entry in the ML engine dashboard for the whole hyperparameter tuning. On Wed, May 23, 2018 at 6:27 PM Mochammad Sidqi Hadyan < notifications@github.com> wrote:
Hi @rsepassi https://github.com/rsepassi,
thanks for the response. I'm using exactly command that provided in tutorial
DATADIR=gs://${BUCKET}/poetry/data OUTDIR=gs://${BUCKET}/poetry/modelhparam JOBNAME=poetry$(date -u +%y%m%d_%H%M%S) echo $OUTDIR $REGION $JOBNAME gsutil -m rm -rf $OUTDIR echo "Y" | t2t-trainer \ --data_dir=gs://${BUCKET}/poetry/subset \ --t2t_usr_dir=./poetry/trainer \ --problems=$PROBLEM \ --model=transformer \ --hparams_set=transformer_poetry \ --output_dir=$OUTDIR \ --hparams_range=transformer_poetry_range \ --autotune_objective='metrics-poetry_line_problem/accuracy_per_sequence' \ --autotune_maximize \ --autotune_max_trials=4 \ --autotune_parallel_trials=4 \ --train_steps=7500 --cloud_mlengine --worker_gpu=4
The JSON logs is looks fine for each trial. nothing special. I'll give you the last lines of logs to show that there is not best score params yielded into the end of the logs
[image: 1] https://user-images.githubusercontent.com/34705256/40459548-3712be1e-5f2c-11e8-9ad6-e377b507b6ec.png
I had already ran it several times and still get nothing.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/825#issuecomment-391558129, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEGW2lZuRcPnncEKfxE5D7GqhfiFSXmks5t1gyFgaJpZM4UJp0f .
Ah i see, do you mean this @rsepassi?
Parameter Input
{
"scaleTier": "CUSTOM",
"masterType": "complex_model_m_p100",
"packageUris": [
"gs://test_t2t/poetry/model_hparam/tensor2tensor_tmp.tar.gz",
"gs://test_t2t/poetry/model_hparam/t2t_usr_container.tar.gz"
],
"pythonModule": "tensor2tensor.bin.t2t_trainer",
"args": [
"--eval_steps=100",
"--cloud_tpu=False",
"--hparams_range=transformer_poetry_range",
"--decode_hparams=",
"--sync=False",
"--eval_run_autoregressive=False",
"--eval_use_test_set=False",
"--only_use_ae_for_policy=False",
"--worker_id=0",
"--eval_early_stopping_metric_minimize=True",
"--worker_replicas=1",
"--worker_gpu_memory_fraction=0.95",
"--train_steps=2000",
"--cloud_tpu_name=test-tpu",
"--locally_shard_to_cpu=False",
"--iterations_per_loop=100",
"--registry_help=False",
"--worker_gpu=4",
"--keep_checkpoint_max=20",
"--save_checkpoints_secs=0",
"--gpu_order=",
"--master=",
"--generate_data=False",
"--intra_op_parallelism_threads=0",
"--enable_graph_rewriter=False",
"--eval_early_stopping_metric=loss",
"--output_dir=gs://test_t2t/poetry/model_hparam",
"--profile=False",
"--ps_job=/job:ps",
"--tmp_dir=/tmp/t2t_datagen",
"--schedule=continuous_train_and_eval",
"--inter_op_parallelism_threads=0",
"--hparams=",
"--use_tpu=False",
"--eval_early_stopping_metric_delta=0.1",
"--ps_gpu=0",
"--tfdbg=False",
"--local_eval_frequency=1000",
"--data_dir=gs://test_t2t/poetry/subset",
"--ps_replicas=0",
"--export_saved_model=False",
"--problem=poetry_line_problem",
"--log_device_placement=False",
"--hparams_set=transformer_poetry",
"--dbgprofile=False",
"--timit_paths=",
"--cloud_skip_confirmation=False",
"--cloud_delete_on_done=False",
"--tpu_num_shards=8",
"--cloud_vm_name=test-vm",
"--parsing_path=",
"--worker_job=/job:localhost",
"--model=transformer",
"--keep_checkpoint_every_n_hours=10000",
"--t2t_usr_dir",
"t2t_usr_dir_internal"
],
"hyperparameters": {
"goal": "MAXIMIZE",
"params": [
{
"parameterName": "hp_hidden_size",
"type": "DISCRETE",
"discreteValues": [
128,
256,
512
]
},
{
"parameterName": "hp_learning_rate",
"minValue": 0.05,
"maxValue": 0.25,
"type": "DOUBLE",
"scaleType": "UNIT_LOG_SCALE"
},
{
"parameterName": "hp_attention_dropout",
"minValue": 0.4,
"maxValue": 0.7,
"type": "DOUBLE"
},
{
"parameterName": "hp_num_hidden_layers",
"minValue": 2,
"maxValue": 4,
"type": "INTEGER"
}
],
"maxTrials": 4,
"maxParallelTrials": 4,
"hyperparameterMetricTag": "metrics-poetry_line_problem/accuracy_per_sequence"
},
"region": "asia-east1",
"runtimeVersion": "1.8",
"jobDir": "gs://test_t2t/poetry/model_hparam",
"pythonVersion": "2.7"
}
Parameter Output
{
"completedTrialCount": "4",
"trials": [
{
"trialId": "1",
"hyperparameters": {
"hp_hidden_size": "128",
"hp_learning_rate": "0.10203632059457049",
"hp_num_hidden_layers": "4",
"hp_attention_dropout": "0.52901200589059827"
}
},
{
"trialId": "2",
"hyperparameters": {
"hp_attention_dropout": "0.64617604866780931",
"hp_hidden_size": "256",
"hp_learning_rate": "0.18905077512294322",
"hp_num_hidden_layers": "4"
}
},
{
"trialId": "3",
"hyperparameters": {
"hp_attention_dropout": "0.58885243185235137",
"hp_hidden_size": "128",
"hp_learning_rate": "0.10596887917921334",
"hp_num_hidden_layers": "4"
}
},
{
"trialId": "4",
"hyperparameters": {
"hp_attention_dropout": "0.59207490095311122",
"hp_hidden_size": "128",
"hp_learning_rate": "0.06655300061633318",
"hp_num_hidden_layers": "4"
}
}
],
"consumedMLUnits": 25.32,
"isHyperparameterTuningJob": true
}
yes. do you see the metric you specified with the autotune objective flag somewhere on that page? On Thu, May 24, 2018 at 12:07 AM Mochammad Sidqi Hadyan < notifications@github.com> wrote:
Ah i see, do you mean this @rsepassi https://github.com/rsepassi?
Parameter Input
{ "scaleTier": "CUSTOM", "masterType": "complex_model_m_p100", "packageUris": [ "gs://test_t2t/poetry/model_hparam/tensor2tensor_tmp.tar.gz", "gs://test_t2t/poetry/model_hparam/t2t_usr_container.tar.gz" ], "pythonModule": "tensor2tensor.bin.t2t_trainer", "args": [ "--eval_steps=100", "--cloud_tpu=False", "--hparams_range=transformer_poetry_range", "--decode_hparams=", "--sync=False", "--eval_run_autoregressive=False", "--eval_use_test_set=False", "--only_use_ae_for_policy=False", "--worker_id=0", "--eval_early_stopping_metric_minimize=True", "--worker_replicas=1", "--worker_gpu_memory_fraction=0.95", "--train_steps=2000", "--cloud_tpu_name=test-tpu", "--locally_shard_to_cpu=False", "--iterations_per_loop=100", "--registry_help=False", "--worker_gpu=4", "--keep_checkpoint_max=20", "--save_checkpoints_secs=0", "--gpu_order=", "--master=", "--generate_data=False", "--intra_op_parallelism_threads=0", "--enable_graph_rewriter=False", "--eval_early_stopping_metric=loss", "--output_dir=gs://test_t2t/poetry/model_hparam", "--profile=False", "--ps_job=/job:ps", "--tmp_dir=/tmp/t2t_datagen", "--schedule=continuous_train_and_eval", "--inter_op_parallelism_threads=0", "--hparams=", "--use_tpu=False", "--eval_early_stopping_metric_delta=0.1", "--ps_gpu=0", "--tfdbg=False", "--local_eval_frequency=1000", "--data_dir=gs://test_t2t/poetry/subset", "--ps_replicas=0", "--export_saved_model=False", "--problem=poetry_line_problem", "--log_device_placement=False", "--hparams_set=transformer_poetry", "--dbgprofile=False", "--timit_paths=", "--cloud_skip_confirmation=False", "--cloud_delete_on_done=False", "--tpu_num_shards=8", "--cloud_vm_name=test-vm", "--parsing_path=", "--worker_job=/job:localhost", "--model=transformer", "--keep_checkpoint_every_n_hours=10000", "--t2t_usr_dir", "t2t_usr_dir_internal" ], "hyperparameters": { "goal": "MAXIMIZE", "params": [ { "parameterName": "hp_hidden_size", "type": "DISCRETE", "discreteValues": [ 128, 256, 512 ] }, { "parameterName": "hp_learning_rate", "minValue": 0.05, "maxValue": 0.25, "type": "DOUBLE", "scaleType": "UNIT_LOG_SCALE" }, { "parameterName": "hp_attention_dropout", "minValue": 0.4, "maxValue": 0.7, "type": "DOUBLE" }, { "parameterName": "hp_num_hidden_layers", "minValue": 2, "maxValue": 4, "type": "INTEGER" } ], "maxTrials": 4, "maxParallelTrials": 4, "hyperparameterMetricTag": "metrics-poetry_line_problem/accuracy_per_sequence" }, "region": "asia-east1", "runtimeVersion": "1.8", "jobDir": "gs://test_t2t/poetry/model_hparam", "pythonVersion": "2.7" }
Parameter Output
{ "completedTrialCount": "4", "trials": [ { "trialId": "1", "hyperparameters": { "hp_hidden_size": "128", "hp_learning_rate": "0.10203632059457049", "hp_num_hidden_layers": "4", "hp_attention_dropout": "0.52901200589059827" } }, { "trialId": "2", "hyperparameters": { "hp_attention_dropout": "0.64617604866780931", "hp_hidden_size": "256", "hp_learning_rate": "0.18905077512294322", "hp_num_hidden_layers": "4" } }, { "trialId": "3", "hyperparameters": { "hp_attention_dropout": "0.58885243185235137", "hp_hidden_size": "128", "hp_learning_rate": "0.10596887917921334", "hp_num_hidden_layers": "4" } }, { "trialId": "4", "hyperparameters": { "hp_attention_dropout": "0.59207490095311122", "hp_hidden_size": "128", "hp_learning_rate": "0.06655300061633318", "hp_num_hidden_layers": "4" } } ], "consumedMLUnits": 25.32, "isHyperparameterTuningJob": true }
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensor2tensor/issues/825#issuecomment-391611722, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEGW_yroYL3zT1RqI78XMu1oP9j-ptbks5t1lwbgaJpZM4UJp0f .
@rsepassi yes i can see it from JSON training input specification
"maxTrials": 4,
"maxParallelTrials": 4,
"hyperparameterMetricTag": "metrics-poetry_line_problem/accuracy_per_sequence"
Sorry @rsepassi to ping you again. I'm still unable to solve this issue. Any idea?
I have the same problem...
Hi guys,
I'm trying to run hyperparameter tuning from example: https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/09_sequence/poetry.ipynb
From the example, it's expected for us to get best parameters from specific trial.
But, after running it using the latest version of T2T, it doesn't show anything like above JSON but just score for each trial. Is there anything i'm missing? Thanks in advance