Closed hyerra closed 1 year ago
Thanks for reporting. I am looking into it.
Do the other losses work (e.g. PPO)?
This is not dependent on multi agent, it seems like SAC does not work with batch sizes that have more than one dim.
I was able to reproduce with the single-agent code
import torch
from tensordict.nn import InteractionType
from torch import nn
from torchrl.modules.distributions import OneHotCategorical
from torchrl.data.tensor_specs import (
OneHotDiscreteTensorSpec,
)
from torchrl.modules import ProbabilisticActor, SafeModule, ValueOperator, MLP
from torchrl.objectives import DiscreteSACLoss
from tensordict import TensorDict
device = "cpu"
actor_net = MLP(
in_features=128,
num_cells=[256, 256],
out_features=5,
activation_class=nn.ReLU,
)
actor_module = SafeModule(
actor_net,
in_keys=["encoder_vec"],
out_keys=[
"logits",
],
)
unbatched_action_spec = OneHotDiscreteTensorSpec(
n=5, shape=torch.Size([5]), dtype=torch.int64
)
actor = ProbabilisticActor(
spec=unbatched_action_spec,
in_keys=["logits"],
out_keys=["action"],
module=actor_module,
distribution_class=OneHotCategorical,
default_interaction_type=InteractionType.RANDOM,
return_log_prob=False,
)
qvalue_net = MLP(
in_features=128,
num_cells=[256, 256],
out_features=5,
activation_class=nn.ReLU,
)
qvalue = ValueOperator(
in_keys=["encoder_vec"],
out_keys=["state_value"],
module=qvalue_net,
)
model = torch.nn.ModuleList([actor, qvalue]).to(device)
loss_module = DiscreteSACLoss(
actor_network=model[0],
qvalue_network=model[1],
num_actions=5,
num_qvalue_nets=2,
target_entropy_weight=0.2,
loss_function="smooth_l1",
)
loss_module.make_value_estimator(gamma=0.99)
single_agent_td = TensorDict(
source={
"action": torch.zeros((256,1, 5), dtype=torch.float32),
"encoder_vec": torch.zeros((256, 1,128), dtype=torch.float32),
"logits": torch.zeros((256,1, 5), dtype=torch.float32),
"next": TensorDict(
source={
"reward": torch.zeros((256,1, 1), dtype=torch.float32),
"done": torch.zeros((256, 1,1), dtype=torch.bool),
},
batch_size=torch.Size([256,1]),
),
},
batch_size=torch.Size([256,1]),
)
loss_module(single_agent_td)
Now it is just a matter of understanding the code of the loss module and see where is the point that does not generalize
@vmoens I might go for a rewrite of discrete sac to align with normal sac and increase readability and modularity
Ah yep, I should have clarified here. The issue isn’t with your multi-agent components like MultiAgentMLP or anything like that. I just meant to refer to multi-agent as a motivating example because that’s a common case where batch dimensions will be greater than 1.
I rewrote discrete SAC to in #1461 try it out, it should be more flexible
here is a script to see how you can adapt yours
import torch
from tensordict.nn import InteractionType
from torch import nn
from torchrl.modules.distributions import OneHotCategorical
from torchrl.data.tensor_specs import (
OneHotDiscreteTensorSpec,
)
from torchrl.modules import ProbabilisticActor, SafeModule, ValueOperator, MLP
from torchrl.objectives import DiscreteSACLoss
from tensordict import TensorDict
device = "cpu"
actor_net = MLP(
in_features=128,
num_cells=[256, 256],
out_features=5,
activation_class=nn.ReLU,
)
actor_module = SafeModule(
actor_net,
in_keys=["encoder_vec"],
out_keys=[
"logits",
],
)
unbatched_action_spec = OneHotDiscreteTensorSpec(
n=5, shape=torch.Size([5]), dtype=torch.int64
)
actor = ProbabilisticActor(
spec=unbatched_action_spec,
in_keys=["logits"],
out_keys=["action"],
module=actor_module,
distribution_class=OneHotCategorical,
default_interaction_type=InteractionType.RANDOM,
return_log_prob=False,
)
qvalue_net = MLP(
in_features=128,
num_cells=[256, 256],
out_features=5,
activation_class=nn.ReLU,
)
qvalue = ValueOperator(
in_keys=["encoder_vec"],
out_keys=["action_value"],
module=qvalue_net,
)
model = torch.nn.ModuleList([actor, qvalue]).to(device)
loss_module = DiscreteSACLoss(
actor_network=model[0],
qvalue_network=model[1],
num_actions=5,
num_qvalue_nets=2,
target_entropy_weight=0.2,
loss_function="smooth_l1",
)
loss_module.make_value_estimator(gamma=0.99)
single_agent_td = TensorDict(
source={
"action": torch.zeros((256, 1, 5), dtype=torch.float32),
"encoder_vec": torch.zeros((256, 1, 128), dtype=torch.float32),
"logits": torch.zeros((256, 1, 5), dtype=torch.float32),
"next": TensorDict(
source={
"reward": torch.zeros((256, 1, 1), dtype=torch.float32),
"done": torch.zeros((256, 1, 1), dtype=torch.bool),
"encoder_vec": torch.zeros((256, 1, 128), dtype=torch.float32),
},
batch_size=torch.Size([256, 1]),
),
},
batch_size=torch.Size([256, 1]),
)
loss_module(single_agent_td)
notably, before you were missing "encoder_vec" in next and the previous implementation was silently failing
Describe the bug
I am currently trying to use the
DiscreteSACLoss
in a multi-agent environment. I am currently following this tutorial. When I try to run my example though, I get some dimension issues. Maybe we are handling the dimensions (unsqueezing/squeezing) incorrectly in the loss so that it doesn't generalize to multi-agent environments?It is important to note that in my setup, I am only using a single-agent. However, the game environment I'm using (Unity) supports multiple agents so the tensordicts/specs are all setup as if it were a multi-agent setup. For the purposes of this bug though, we don't really need to consider Unity. I just provide an example that is independent of Unity below.
To Reproduce
You can run this simple script using the latest
torchrl
andtensordict
installed frommain
:Expected behavior
Ideally we should be handling dimensions so that it can work in these multi-agent environments as well.
System info
Describe the characteristic of your environment: Using the latest
torchrl
andtensordict
installed directly from GitHub source.Output:
Checklist
cc: @matteobettini