google / model_search

Apache License 2.0
3.26k stars 462 forks source link

FailedPreconditionError: No completed trials to perform ensembling. #41

Open raul-parada opened 3 years ago

raul-parada commented 3 years ago

I'm using the Colab example https://colab.research.google.com/drive/1k1EaKDCTB2fU9XtIdiiXEDyDOvrU6fmD?usp=sharing

Using my own data (composed of two attributes, both of them classes for regression), I got the following error:


I0313 17:21:15.287168 140503459874688 metadata_store.py:93] MetadataStore with DB connection initialized
I0313 17:21:15.307923 140503459874688 oss_trainer_lib.py:290] creating directory: /tmp/run_example/tuner-1/5
I0313 17:21:15.309459 140503459874688 oss_trainer_lib.py:337] Tuner id: tuner-1
I0313 17:21:15.310938 140503459874688 oss_trainer_lib.py:338] Training with the following hyperparameters: 
I0313 17:21:15.312628 140503459874688 oss_trainer_lib.py:339] {'learning_rate': 1.0536239272270075e-05, 'new_block_type': 'FULLY_CONNECTED_RESIDUAL_FORCE_MATCH_SHAPES', 'optimizer': 'adam', 'initial_architecture_0': 'FULLY_CONNECTED_RESIDUAL_CONCAT', 'exponential_decay_rate': 0.8225550864907855, 'exponential_decay_steps': 250, 'gradient_max_norm': 2, 'dropout_rate': 0.20000000596046447, 'initial_architecture': ['FULLY_CONNECTED_RESIDUAL_CONCAT']}
I0313 17:21:15.314630 140503459874688 run_config.py:550] TF_CONFIG environment variable: {'model_dir': '/tmp/run_example/tuner-1/5', 'session_master': ''}
I0313 17:21:15.316400 140503459874688 run_config.py:973] Using model_dir in TF_CONFIG: /tmp/run_example/tuner-1/5
I0313 17:21:15.318898 140503459874688 estimator.py:191] Using config: {'_model_dir': '/tmp/run_example/tuner-1/5', '_tf_random_seed': None, '_save_summary_steps': 2000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 120, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0313 17:21:15.373794 140503459874688 estimator.py:1169] Calling model_fn.
I0313 17:21:15.389836 140503459874688 controller.py:160] trial id: 5
I0313 17:21:15.391364 140503459874688 controller.py:239] intermix ensemble search mode
I0313 17:21:15.406418 140503459874688 phoenix.py:371] {'prior_generator': GeneratorWithTrials(instance=<model_search.generators.prior_generator.PriorGenerator object at 0x7fc8eed807d0>, relevant_trials=[])}
---------------------------------------------------------------------------
FailedPreconditionError                   Traceback (most recent call last)
<ipython-input-17-bfaafc6a9348> in <module>()
      6     batch_size=32,
      7     experiment_name="example",
----> 8     experiment_owner="model_search_user")

11 frames
/content/model_search/model_search/generators/prior_generator.py in _nonadaptive_ensemble(self, features, input_layer_fn, shared_input_tensor, shared_lengths, logits_dimension, relevant_trials, is_training, num_trials_to_consider, width, my_model_dir)
     64     if not best_trials:
     65       raise tf.errors.FailedPreconditionError(
---> 66           None, None, "No completed trials to perform ensembling.")
     67 
     68     if len(best_trials) < width:

FailedPreconditionError: No completed trials to perform ensembling.

Any clue?