Implementation of Inverse Reinforcement Learning Algorithm on a toy car in a 2D world problem, (Apprenticeship Learning via Inverse Reinforcement Learning Abbeel & Ng, 2004)
From what I understand from the original paper and your blog, we assign a (+1) label to the expert policy and (-1) label to all the other policies. In the optimization function of your implementation I guess that the h_list keeps these label values. My question is why the expert policy has the same label as all the other policies (+1 which later becomes -1 in the h matrix)?
Hello, I wanted to ask a quick question.
From what I understand from the original paper and your blog, we assign a (+1) label to the expert policy and (-1) label to all the other policies. In the optimization function of your implementation I guess that the h_list keeps these label values. My question is why the expert policy has the same label as all the other policies (+1 which later becomes -1 in the h matrix)?
Thanks in advance.