Closed chenzhutian closed 2 years ago
The transition probablity is unknown to the robot because humans' policies are hidden to the robot.
Thanks for your reply!
Yes it should be unknown. But if I am correct, does your implementation assume that it is known? According to this part of code (https://github.com/vita-epfl/CrowdNav/blob/master/crowd_nav/policy/multi_human_rl.py#L38-L56), the policy picks the action with the max reward, which is calculated based on the human's next movement. However, without knowing the transition probability, how can we know the human's next movement?
@chenzhutian sorry about the confusion, you are actually right. In this repo, or the crowndnav paper, we assumed that to be true (query_env=True
) just to simplify the problem. My implemented baseline CADRL has the same assumption, so the comparison is fair. You can simply set query_env=False
in the config file to remove this assumption.
In my follow-up work Relational Graph Learning for Crowd Navigation, we did not make that assumption any more. Instead, we trained a model to predict human trajectory. Please check out that paper if you are interested.
Sounds great! Thanks for your clarification. It is super helpful.
Hi, thanks again for your repo. I am wondering is it the transition probability from time t to time t + ∆t assumed to be known? Since in [1] the transition probability is unknown and thus the authors proposed to employ a policy-based learning.
Thanks!
[1] Michael Everett‡ , Yu Fan Chen† , and Jonathan P. How‡. Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning.