vita-epfl / CrowdNav

[ICRA19] Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
MIT License
585 stars 169 forks source link

command problem #32

Closed hurong971013 closed 3 years ago

hurong971013 commented 3 years ago

Sorry to bother you, I do not know what do these command mean, like symbol "--" if I want to test the sarl policy, can I just run the one command "python test.py --policy sarl --model_dir data/output --phrase test". Because there is another commmand called "python test.py --policy orca --phrase test".I think this command is to test orca policy. But I just want to test sarl policy. Another question is what is "--model_dir data/output --phrase test" mean? Please help me, I will be greatful. Thank you

ChanganVR commented 3 years ago

Hi, '--' is specifying command line arguments for argparse module, which gets parsed into variables in the script. Check out https://github.com/vita-epfl/CrowdNav/blob/e9196609a8a646baf444698884026456444d6e8a/crowd_nav/train.py#L17-L28.

"--model_dir data/output --phrase test" means taking "data/output" as the model directory and test the model saved in that directory.

hurong971013 commented 3 years ago

Hi, '--' is specifying command line arguments for argparse module, which gets parsed into variables in the script. Check out

https://github.com/vita-epfl/CrowdNav/blob/e9196609a8a646baf444698884026456444d6e8a/crowd_nav/train.py#L17-L28

. "--model_dir data/output --phrase test" means taking "data/output" as the model directory and test the model saved in that directory.

Thank you so much for helping me. When I see the paper "hi = ψh(ei;Wh)" means ei is fed to a MLP to obtain the hi. But in the picture of Interaction Module, ei and hi are parallel, in no order. I do not understand the "hi". Can you help me?

ChanganVR commented 3 years ago

The equations are correct. The figures were made in a way to help people understand easier.

So we what did was to first compute ei, and then feed ei into an MLP to obtain hi.

hurong971013 commented 3 years ago

The equations are correct. The figures were made in a way to help people understand easier.

So we what did was to first compute ei, and then feed ei into an MLP to obtain hi.

Oh I see. Thank you. I have another question. There are mlp1, mlp2,mlp3 in the code of "sarl.py". But there are more than three mlps in the paper. Combine s, wi, and Mi to get ei through one MLP. Then feed ei into an MLP to obtain hi. After the em is obtained by average pooling, a lot of attention scores α are obtained through one MLP. Finally, in the planning module s and c(representation of the crowd) also need to get value through an MLP.

mlp1=Combine s, wi, and Mi to get ei through one MLP mlp2=Then feed ei into an MLP to obtain hi mlp3=in the planning module s and c(representation of the crowd) also need to get value through an MLP. Are the three mlp points in the code I understand correct above? This part of the code is :

def forward(self, state): size = state.shape self_state = state[:, 0, :self.self_state_dim] mlp1_output = self.mlp1(state.view((-1, size[2]))) mlp2_output = self.mlp2(mlp1_output)

    if self.with_global_state:
        # compute attention scores
        global_state = torch.mean(mlp1_output.view(size[0], size[1], -1), 1, keepdim=True)
        global_state = global_state.expand((size[0], size[1], self.global_state_dim)).\
            contiguous().view(-1, self.global_state_dim)
        attention_input = torch.cat([mlp1_output, global_state], dim=1)
    else:
        attention_input = mlp1_output
    scores = self.attention(attention_input).view(size[0], size[1], 1).squeeze(dim=2)

    # masked softmax
    # weights = softmax(scores, dim=1).unsqueeze(2)
    scores_exp = torch.exp(scores) * (scores != 0).float()
    weights = (scores_exp / torch.sum(scores_exp, dim=1, keepdim=True)).unsqueeze(2)
    self.attention_weights = weights[0, :, 0].data.cpu().numpy()

    # output feature is a linear combination of input features
    features = mlp2_output.view(size[0], size[1], -1)
    # for converting to onnx
    # expanded_weights = torch.cat([torch.zeros(weights.size()).copy_(weights) for _ in range(50)], dim=2)
    weighted_feature = torch.sum(torch.mul(weights, features), dim=1)

    # concatenate agent's state with global weighted humans' state(将代理人的状态与全局加权的人的状态连接起来)
    joint_state = torch.cat([self_state, weighted_feature], dim=1)
    value = self.mlp3(joint_state)
    return value
ChanganVR commented 3 years ago

Yes, your understanding is correct. In addition to the three mlps you mentioned, the attention model is also an mlp.

On Tue, Dec 22, 2020 at 10:49 PM hurong971013 notifications@github.com wrote:

The equations are correct. The figures were made in a way to help people understand easier.

So we what did was to first compute ei, and then feed ei into an MLP to obtain hi.

Oh I see. Thank you. I have another question. There are mlp1, mlp2,mlp3 in the code of "sarl.py". But there are more than three mlps in the paper. Combine s, wi, and Mi to get ei through one MLP. Then feed ei into an MLP to obtain hi. After the em is obtained by average pooling, a lot of attention scores α are obtained through one MLP. Finally, in the planning module s and c(representation of the crowd) also need to get value through an MLP.

mlp1=Combine s, wi, and Mi to get ei through one MLP mlp2=Then feed ei into an MLP to obtain hi mlp3=in the planning module s and c(representation of the crowd) also need to get value through an MLP. Are the three mlp points in the code I understand correct above? This part of the code is :

def forward(self, state): """ First transform the world coordinates to self-centric coordinates and then do forward computation

:param state: tensor of shape (batch_size, # of humans, length of a rotated state)

:return:

"""

size = state.shape

self_state = state[:, 0, :self.self_state_dim]

mlp1_output = self.mlp1(state.view((-1, size[2])))

mlp2_output = self.mlp2(mlp1_output)

if self.with_global_state:

    # compute attention scores

    global_state = torch.mean(mlp1_output.view(size[0], size[1], -1), 1, keepdim=True)

    global_state = global_state.expand((size[0], size[1], self.global_state_dim)).\

        contiguous().view(-1, self.global_state_dim)

    attention_input = torch.cat([mlp1_output, global_state], dim=1)

else:

    attention_input = mlp1_output

scores = self.attention(attention_input).view(size[0], size[1], 1).squeeze(dim=2)

# masked softmax

# weights = softmax(scores, dim=1).unsqueeze(2)

scores_exp = torch.exp(scores) * (scores != 0).float()

weights = (scores_exp / torch.sum(scores_exp, dim=1, keepdim=True)).unsqueeze(2)

self.attention_weights = weights[0, :, 0].data.cpu().numpy()

# output feature is a linear combination of input features

features = mlp2_output.view(size[0], size[1], -1)

# for converting to onnx

# expanded_weights = torch.cat([torch.zeros(weights.size()).copy_(weights) for _ in range(50)], dim=2)

weighted_feature = torch.sum(torch.mul(weights, features), dim=1)

# concatenate agent's state with global weighted humans' state(将代理人的状态与全局加权的人的状态连接起来)

joint_state = torch.cat([self_state, weighted_feature], dim=1)

value = self.mlp3(joint_state)

return value

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vita-epfl/CrowdNav/issues/32#issuecomment-749930191, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADL5PTLWUIVNPUHO2Q5BWDDSWFZELANCNFSM4U2KNL7A .