grooviiee / python_uav

Challenge to Reinforcement learning.
0 stars 0 forks source link

runner_collect 에서 왜 return 값으로 action result를 주지 않고, shape를 주고 있을까?? #25

Closed grooviiee closed 1 year ago

grooviiee commented 1 year ago

def runner_collect(self, step): 에서 trainer 함수를 돌려서 get_actions를 얻어와야 하는것 아닐까?

현재 return값으로 Action Definition 값을 던져주고 있어 보인다. elif self.envs.action_space[agent_id].class.name == 'Box':

TODO: Fix below shape into Discrete or Multi Discrete

            # [RUNNER] agent_id : 0, action space dType: Box value: Box(False, True, (5, 20), bool)
            action_env = self.envs.action_space[agent_id]
            action_env = flatten(action_env, 1)
        elif self.envs.action_space[agent_id].__class__.__name__ == 'Tuple':
            # TODO: Fix below shape into Discrete or Multi Discrete
            # [RUNNER] agent_id : 4, action space dType: Tuple value: Tuple(Box(False, True, (2, 10), bool), Box(0.0, 23.0, (2,), float32), Box(0.0, 5.0, (2,), float32), Box(0.0, 3.0, (2,), float32))
            action_env = self.envs.action_space[agent_id]

그래서 sample()을 사용해서 임시로 값을 던져주고 있었나보다.

grooviiee commented 1 year ago

mappoPolicy.py에 위치한 get_actions -> self.actor 함수를 체크해볼 것

class R_Actor -> def forward

    self.act = ACTLayer(
        action_space, self.hidden_size, self._use_orthogonal, self._gain
    )

class ACTLayer -> def forward

       elif self.tuple:
        actions = []
        action_log_probs = []

        for action_out in self.action_outs:
            action_logit = action_out(x)
            action = action_logit.mode() if deterministic else action_logit.sample()
            action_log_prob = action_logit.log_probs(action)
            actions.append(action)
            action_log_probs.append(action_log_prob)

        actions = torch.cat(actions, -1)
        action_log_probs = torch.cat(action_log_probs, -1)
grooviiee commented 1 year ago

[Issue Resolved] return 값을 actions를 넣어두지 않고, action_shape값으로 넣어두고 있어서 발생한 문제였다.