best model - Githubissues

shunjiu commented 2 years ago

hello, how to evaluate the model? I use the test command from GDPL, got low success rate.

the command and outputs are below.

python main.py --test True --load model_rl/best > result.txt DEBUG:root:Namespace(anneal=5000, batchsz=32, batchsz_traj=1024, clip=0.02, config='multiwoz', data_dir='data', ensemble_size=5, epoch=32, epsilon=0.2, gamma=0.99, load='model_rl/best', load_user='model/best', log_dir='log', lr_irl=0.001, lr_rl=0.0003, lr_simu=0.001, model_horizon=5, pretrain=False, pretrain_world=False, print_per_batch=400, process=16, save_dir='model', save_per_epoch=1, sim_ratio=1.0, simulator='agenda', tau=0.95, test=True, update_round=5) INFO:root:Load processed data file DEBUG:root:test INFO:root:Loading goal model is done INFO:root:Loading goal model is done INFO:root:<> loaded checkpoint from file: model_rl/best_estimator.mdl INFO:root:<<world model 0>> loaded checkpoint from file: model_rl/best_wm_0.pol.mdl INFO:root:<<world model 0>> loaded checkpoint from file: model_rl/best_wm_0.ter.mdl INFO:root:<<world model 1>> loaded checkpoint from file: model_rl/best_wm_1.pol.mdl INFO:root:<<world model 1>> loaded checkpoint from file: model_rl/best_wm_1.ter.mdl INFO:root:<<world model 2>> loaded checkpoint from file: model_rl/best_wm_2.pol.mdl INFO:root:<<world model 2>> loaded checkpoint from file: model_rl/best_wm_2.ter.mdl INFO:root:<<world model 3>> loaded checkpoint from file: model_rl/best_wm_3.pol.mdl INFO:root:<<world model 3>> loaded checkpoint from file: model_rl/best_wm_3.ter.mdl INFO:root:<<world model 4>> loaded checkpoint from file: model_rl/best_wm_4.pol.mdl INFO:root:<<world model 4>> loaded checkpoint from file: model_rl/best_wm_4.ter.mdl INFO:root:<

smt-HS commented 2 years ago

Hi,

We didn't use the best model selected by GDPL as we don't think it is the right way to select the best-performing model for our method. If you check your model_rl directory, you should see several models, each is saved after training one epoch. We measure the performance of all of them and report the best of them.

zzcccci commented 4 months ago

I have the same question，evaluate result got low success rate

smt-HS / I-SEE

best model #2