vita-epfl / CrowdNav

[ICRA19] Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
MIT License
560 stars 166 forks source link

Question about the formula calculating c value #39

Open ghost opened 3 years ago

ghost commented 3 years ago

In the <Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning> paper,

image

what does this formula mean? softmax gives a vector and h_i is also a vector. what does mulitplication of softmax and h_I means?

ChanganVR commented 3 years ago

hi @guldamkwak3114 we meant to normalize the scores alpha_i and do weighted sum of all h_i. Sorry for the bad and confusing writing.

ghost commented 3 years ago

What do you mean by normalize the scores alpha_i ?

ChanganVR commented 3 years ago

Softmax is the normalization operation.

ghost commented 3 years ago

Still confuse me. softmax(alpha_i) and h_i are both vectors.................... how can we multiply them?

ghost commented 3 years ago

do you mean dot product of softmax(alpha_i) and h_i ?

ChanganVR commented 3 years ago

The softmax(alpha_i) here means doing softmax for all scores and take the ith component, which is just a single value. So we normalize the score with softmax for all neighbors and do a weighted sum of each neighbor's interaction feature.

ghost commented 3 years ago

alpha_i itself is a vector. then do you mean take softmax of (a_1,a_2, .... ,a_n) and get the i-th component?

Thanks

ChanganVR commented 3 years ago

Is alpha_i a vecotr? alpha_i is the score for each pair right?

take softmax of (a_1,a_2, .... ,a_n) and get the i-th component? This is right.

ghost commented 3 years ago

according to this alpha_i should be length 100 vector. Am i miss-reading something? image

ChanganVR commented 3 years ago

The hidden units refers to all the MLP layers up to but not including the last layer. The last layer outputs one single value for each pair/human as the attention score. The corresponding code: https://github.com/vita-epfl/CrowdNav/blob/503173b836d5460e30234df7e14a7c67ee0ebfc7/crowd_nav/policy/sarl.py#L48