vita-epfl / CrowdNav

[ICRA19] Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
MIT License
599 stars 171 forks source link

Is it the transition probability from time t to time t + ∆t assumed to be known? #48

Closed chenzhutian closed 2 years ago

chenzhutian commented 2 years ago

Hi, thanks again for your repo. I am wondering is it the transition probability from time t to time t + ∆t assumed to be known? Since in [1] the transition probability is unknown and thus the authors proposed to employ a policy-based learning.

Thanks!

[1] Michael Everett‡ , Yu Fan Chen† , and Jonathan P. How‡. Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning.

ChanganVR commented 2 years ago

The transition probablity is unknown to the robot because humans' policies are hidden to the robot.

chenzhutian commented 2 years ago

Thanks for your reply!

Yes it should be unknown. But if I am correct, does your implementation assume that it is known? According to this part of code (https://github.com/vita-epfl/CrowdNav/blob/master/crowd_nav/policy/multi_human_rl.py#L38-L56), the policy picks the action with the max reward, which is calculated based on the human's next movement. However, without knowing the transition probability, how can we know the human's next movement?

ChanganVR commented 2 years ago

@chenzhutian sorry about the confusion, you are actually right. In this repo, or the crowndnav paper, we assumed that to be true (query_env=True) just to simplify the problem. My implemented baseline CADRL has the same assumption, so the comparison is fair. You can simply set query_env=False in the config file to remove this assumption.

In my follow-up work Relational Graph Learning for Crowd Navigation, we did not make that assumption any more. Instead, we trained a model to predict human trajectory. Please check out that paper if you are interested.

chenzhutian commented 2 years ago

Sounds great! Thanks for your clarification. It is super helpful.