Closed joaomatoscf closed 3 years ago
It depends on your model, machin doesn't enforce any restriction on your actor output, you could use mixed action space (discrete + continuous), multi-action space, parameterized action space, etc.
If you provide more details I can help with your implementation.
Thank you for your answer, If I want to use this code to work for example with an action MultiDiscrite([2,2]), what do I need to change?
Code (based on machin examples/tutorials):
max_episodes = 50 max_steps = 10000
observe_dim = env.observation_space.shape[0] action_num = env.action_space.n
class Actor(nn.Module): def init(self, state_dim, action_num): super(Actor, self).init()
self.fc1 = nn.Linear(state_dim, 16)
self.fc2 = nn.Linear(16, 16)
self.fc3 = nn.Linear(16, action_num)
def forward(self, state, action=None):
a = t.relu(self.fc1(state))
a = t.relu(self.fc2(a))
probs = t.softmax(self.fc3(a), dim=1)
dist = Categorical(probs=probs)
act = (action
if action is not None
else dist.sample())
act_entropy = dist.entropy()
act_log_prob = dist.log_prob(act.flatten())
return act, act_log_prob, act_entropy
class Critic(nn.Module): def init(self, state_dim): super(Critic, self).init()
self.fc1 = nn.Linear(state_dim, 16)
self.fc2 = nn.Linear(16, 16)
self.fc3 = nn.Linear(16, 1)
def forward(self, state):
v = t.relu(self.fc1(state))
v = t.relu(self.fc2(v))
v = self.fc3(v)
return v
If your actions are independent and sampled from the same distribution, you can reuse the same parameter for Categorical
in Actor, then sample act1, act2, act3, act4
from the distribution, and finally return these four actions as a tensor of shape [2, 2]
and the sum of their log probability.
If your actions are independent and sampled from different distributions, then you need 4 output heads self.fc3_1, self.fc3_2, self.fc3_3, self.fc3_4
for each categorical distribution. Then do the same modification as above.
I will cite my answer on the PyTorch forum as a reference here, (Note when I say multinomial, what I really mean is extracting each trial in the multinomial distribution as a categorical distribution, which is equivalent to what is described above)
Thank you for your detailed explanation. I implemented it, and now I get an error when storing the episode: ValueError: Key "action" of transition major attribute "action" has invalid batch size 2.
Under it's what the modifications I made look like: class Actor(nn.Module): def init(self, state_dim, action_num): super(Actor, self).init()
self.fc1 = nn.Linear(state_dim, 16)
self.fc2 = nn.Linear(16, 16)
self.fc3 = nn.Linear(16, action_num)
def forward(self, state, action=None):
a = t.relu(self.fc1(state))
a = t.relu(self.fc2(a))
probs = t.softmax(self.fc3(a), dim=1)
dist = Categorical(probs=probs)
act1 = (action
if action is not None
else dist.sample())
act2 = (action
if action is not None
else dist.sample())
act_entropy = dist.entropy()
act1_log_prob = dist.log_prob(act1.flatten())
act2_log_prob = dist.log_prob(act2.flatten())
act = t.tensor([act1,act2])
return act, act1_log_prob+act2_log_prob , act_entropy
act should be act = t.tensor([[act1,act2]])
, the first dimension is always the batch dimension, and note that when action
is not None
it will be your new action act = t.tensor([[act1,act2]])
, so your code should look like:
def forward(self, state, action=None):
a = t.relu(self.fc1(state))
a = t.relu(self.fc2(a))
probs = t.softmax(self.fc3(a), dim=1)
dist = Categorical(probs=probs)
act1 = (action[:, 0]
if action is not None
else dist.sample())
act2 = (action[:, 1]
if action is not None
else dist.sample())
act_entropy = dist.entropy()
act1_log_prob = dist.log_prob(act1.flatten())
act2_log_prob = dist.log_prob(act2.flatten())
act = t.tensor([[act1,act2]])
return act, act1_log_prob+act2_log_prob , act_entropy
I added those changes and it solved the issue of storing the episode.
Now the issue is on ppo.update():
This is because when updating the act1 and act2 are no longer scalars, but 1D arrays. How do you propose to solve this?
Oh, sorry I forgot that behavior, just use torch.cat
in that case:
act = t.cat((act1.view(1, -1), act2.view(1, -1), dim=0)
Hello,
Does machin support Multi Discrete Action Spaces? (two different actions in the same time step) I've looked through the documentation but cannot find anything related to that
João