Open hyunjimoon opened 2 months ago
resolved
For instance, you can run the following code if you want to do a zero-shot transfer from the source task trained for 13.89 m/s to 13 m/s.
python transfer_main.py --speed 13.0 --model_num 1 --source_path_name "results/intersection_reward-waittime_flow1000_lane4.0_length750_speed13.89_left0.25/" --num_episodes 50
parser = argparse.ArgumentParser(description='Arguments')
parser.add_argument('--flow', type=int, default=1000, help='Flow of cars')
parser.add_argument('--lane', type=float, default=4.0, help='Number of lanes')
parser.add_argument('--length', type=float, default=750, help='Length of lanes')
parser.add_argument('--speed', type=float, default=13.89, help='Speed limit')
parser.add_argument('--left', type=float, default=0.25, help='Left turn ratio')
parser.add_argument('--model_num', type=int, default=1, help='Model number')
parser.add_argument('--source_path_name', type=str, default="intersection_flow1000_lane4.0_length750.0_speed13.89_left0.25/", help='pathname')
parser.add_argument('--num_episodes', type=int, default=50, help='Number of episodes')
parser.add_argument('--reward', type=str, default='waittime', help='We only support wait time reward for transferring now.')
args = parser.parse_args()
num_transfer_steps
in https://github.com/hyunjimoon/24_transpo/blob/f47120b11d764bf07b0340f22358f55cfe058041/CP3/analysis/utils.py#L95episode 한번 policy update (Q-learning에서 Q matrix, ) , rollout = episode (
R is from env (Q is from agent)
get_baseline_performance
works and what "oracle_transfer, exhaustive_training, sequential_oracle_training" means?