mit-acl / cadrl_ros

ROS package for dynamic obstacle avoidance for ground robots trained with deep RL
556 stars 156 forks source link

Socially Aware Motion Planning #9

Closed xiaoxianSun closed 4 years ago

xiaoxianSun commented 4 years ago

Hi! This is an amazing work!I want to use the social norm to optimize the reward function. I wonder if there is any possible that you could tell me the value of scalar penalty when you train the network! I didn't find the Specific value in this paper. Thank you!

mfe7 commented 4 years ago

Hi @xiaoxianSun , the IROS '18 paper didn't use the social reward, but in the IROS '17 paper the constant qn in Eqns 9-12 can be tuned according to the tradeoff described in the paper. Looking at some old code I believe we settled on 0.5*(some term based on how close the two agents are):

weight = 0.5; GAMMA = 0.97; DT_NORMAL = 0.5
d = np.linalg.norm(agent_i_pos - agent_j_pos)
v = agent_i_pref_speed
getting_close_penalty = GAMMA ** (d/DT_NORMAL) * (1.0 - GAMMA ** (-v/DT_NORMAL))
penalty = weight * getting_close_penalty
xiaoxianSun commented 4 years ago

Thanks for your kind reply.I am going to try it!