lukashermann / hulc

Hierarchical Universal Language Conditioned Policies
http://hulc.cs.uni-freiburg.de
MIT License
64 stars 10 forks source link

how to evaluate the model in a rl manner #4

Closed dyabel closed 1 year ago

dyabel commented 2 years ago

Thank you for your great work! I wonder how to evaluate the trained model in a rl manner. Can you provide an example? Thx.

lukashermann commented 2 years ago

Hi, could you please specify what you mean with evaluating a model in an RL manner? Our standard way of evaluation is resetting the robot to a neutral position and then it has to follow a chain of language instructions. Do you want more information on how to run it or do you need another way of evaluation?

lukashermann commented 2 years ago

@dyabel, we would like to help you if you give us a bit of information!

dyabel commented 2 years ago

@dyabel, we would like to help you if you give us a bit of information!

Hi, I want to use the already-trained model to interact with the environment and see the reward. As I see from the code, the model is tested on the offline data.

lukashermann commented 2 years ago

No, when we run the evaluation, we actually do rollouts in the environment. Check this part of the code.

You can run the evaluation like this:

python hulc/evaluation/evaluate_policy.py --dataset_path <PATH/TO/DATASET> --train_folder <PATH/TO/TRAINING/FOLDER> --checkpoint <PATH/TO/CHECKPOINT>

add --debug to see a live video of the rollout.

In this line we check if a subtask was completed, you can use this as a binary reward. We are using the word subtask here because the agent has to follow a chain of 5 instructions, but every subtask is a complete task such as "open the drawer".

dyabel commented 2 years ago

No, when we run the evaluation, we actually do rollouts in the environment. Check this part of the code.

You can run the evaluation like this:

python hulc/evaluation/evaluate_policy.py --dataset_path <PATH/TO/DATASET> --train_folder <PATH/TO/TRAINING/FOLDER> --checkpoint <PATH/TO/CHECKPOINT>

add --debug to see a live video of the rollout.

In this line we check if a subtask was completed, you can use this as a binary reward. We are using the word subtask here because the agent has to follow a chain of 5 instructions, but every subtask is a complete task such as "open the drawer".

Thank you for quick reply! I will try that.