Open GeorgeDUT opened 3 years ago
You should use --policy policy.th
in test.py
to use your trained policy.
That's surprising that you didn't get any error when running test.py
without --policy
, since this is a required parameter.
I get it. I run test.py with policy.th. But the rewards of valid_return are equal to or even lower than train_return. Maybe, our environment is not suitable. Thanks.
I write a new environment (navigation on deterministic map): (1) I run " python train.py --config xxxx", and get config.json, policy.th. (2) I run "python test.py -config xxxx", and get results.npz. But the rewards in results.npz are still very low. What should I do to use policy.th to fast adapt to a new task?