Open kakakflo22thy opened 4 years ago
It might be that a trial was not saved correctly.. Can you post a reproducible script?
Sorry that I can't post core code here. But I find that in checkpoint dir .tune_metadata
data
is missing, here is the _save
and _restore
function, is there any help?
def _save(self, model_path):
print("_save")
actor_filepath = model_path + '/actor.h5'
reward_critic_filepath = model_path + '/reward_critic.h5'
cost_critic_filepath = model_path + '/cost_critic.h5'
self.vaemmd_test.actor.save_model(actor_filepath)
self.vaemmd_test.reward_critic.save_model(reward_critic_filepath)
self.vaemmd_test.cost_critic.save_model(cost_critic_filepath)
return model_path
def _restore(self, model_path):
print("_restore")
actor_filepath = model_path + '/actor.h5'
reward_critic_filepath = model_path + '/reward_critic.h5'
cost_critic_filepath = model_path + '/cost_critic.h5'
self.vaemmd_test.actor.load_model(actor_filepath, custom_objects={'LOG_SIG_CAP_MIN': LOG_SIG_CAP_MIN,
'LOG_SIG_CAP_MAX': LOG_SIG_CAP_MAX,
'tf': tf})
self.vaemmd_test.reward_critic.load_model(reward_critic_filepath)
self.vaemmd_test.cost_critic.load_model(cost_critic_filepath)
It turn out that only files below exist
I changed my code as below and this time it output model.tune_metadata
file, but still got the same error...
def _save(self, model_path):
print("_save")
actor_filepath = model_path + '/actor.h5'
reward_critic_filepath = model_path + '/reward_critic.h5'
cost_critic_filepath = model_path + '/cost_critic.h5'
self.vaemmd_test.actor.save_model(actor_filepath)
self.vaemmd_test.reward_critic.save_model(reward_critic_filepath)
self.vaemmd_test.cost_critic.save_model(cost_critic_filepath)
return model_path +'/model'
def _restore(self, model_path):
print("_restore")
print(model_path)
model_path_pre, _ = model_path.split('/')
actor_filepath = model_path_pre + '/actor.h5'
reward_critic_filepath = model_path_pre + '/reward_critic.h5'
cost_critic_filepath = model_path_pre + '/cost_critic.h5'
self.vaemmd_test.actor.load_model(actor_filepath, custom_objects={'LOG_SIG_CAP_MIN': LOG_SIG_CAP_MIN,
'LOG_SIG_CAP_MAX': LOG_SIG_CAP_MAX,
'tf': tf})
self.vaemmd_test.reward_critic.load_model(reward_critic_filepath)
self.vaemmd_test.cost_critic.load_model(cost_critic_filepath)
What is the problem?
Ray version 0.8.4 Error as follow
I've looked through source code and found that delta is calculated from
delta = self._get_result_time(result) - \ self._get_result_time(self._live_trials[trial])
in which_get_result_time
is used to retriveresult[self._time_attr]
. I setself._time_attr
astrainning_iteration
in my script which is supposed to be increasing monotonically during trial running.However, AssertionError shows that the latest
trainning_iteration
is 1, which means as I understand, a new trainable instance has started, while the lastest recorded result byself._live_trials
showstrainning_iteration
is 4 meaning that a trainable instance has been running or ran in the same trial.experiment_id
proves that two records above comes from two different experiments with first'experiment_id':'da43b2881ee941f5845472dc4f2c4e93'
and second'experiment_id':'5114d0c5bf68418881d70f3dfb48c829'
.I feel confused that how two different experiments run in one same trial. I changed parameters several times and it reproduced this issue.