i donot know what the reason of the setting about reward, why does the distance reward divide the sqrt(200), and the anglereward as well ,anyone who can explain about it? thank you ~
distanceReward=-distance/np.sqrt(200)
if abs(angle-euler[2])>np.pi:
angleReward=-abs(-angle-euler[2])/(np.pi)
else:
angleReward=-abs(angle-euler[2])/(np.pi)
i donot know what the reason of the setting about reward, why does the distance reward divide the sqrt(200), and the anglereward as well ,anyone who can explain about it? thank you ~