jangirrishabh / toyCarIRL

Implementation of Inverse Reinforcement Learning Algorithm on a toy car in a 2D world problem, (Apprenticeship Learning via Inverse Reinforcement Learning Abbeel & Ng, 2004)
MIT License
172 stars 47 forks source link

Weights optimization #8

Open cspatharis opened 3 years ago

cspatharis commented 3 years ago

Hello, I wanted to ask a quick question.

From what I understand from the original paper and your blog, we assign a (+1) label to the expert policy and (-1) label to all the other policies. In the optimization function of your implementation I guess that the h_list keeps these label values. My question is why the expert policy has the same label as all the other policies (+1 which later becomes -1 in the h matrix)?

Thanks in advance.