SAC custom env - Githubissues

I get this error:

ValueError Traceback (most recent call last)

in 26 score = 0 27 while not done: ---> 28 action = agent.choose_action(observation) 29 observation_, reward, done, info = env.step(action) 30 score += reward in choose_action(self, observation) 23 def choose_action(self, observation): 24 state = T.Tensor([observation]).to(self.actor.device) ---> 25 actions, _ = self.actor.sample_normal(state, reparameterize=False) 26 27 return actions.cpu().detach().numpy()[0] in sample_normal(self, state, reparameterize) 38 def sample_normal(self, state, reparameterize=True): 39 mu, sigma = self.forward(state) ---> 40 probabilities = Normal(mu, sigma) 41 42 if reparameterize: ~\Anaconda3\lib\site-packages\torch\distributions\normal.py in __init__(self, loc, scale, validate_args) 48 else: 49 batch_shape = self.loc.size() ---> 50 super(Normal, self).__init__(batch_shape, validate_args=validate_args) 51 52 def expand(self, batch_shape, _instance=None): ~\Anaconda3\lib\site-packages\torch\distributions\distribution.py in __init__(self, batch_shape, event_shape, validate_args) 54 if not valid.all(): 55 raise ValueError( ---> 56 f"Expected parameter {param} " 57 f"({type(value).__name__} of shape {tuple(value.shape)}) " 58 f"of distribution {repr(self)} " ValueError: Expected parameter loc (Tensor of shape (1, 1, 1)) of distribution Normal(loc: tensor([[[nan]]], device='cuda:0', grad_fn=), scale: tensor([[[nan]]], device='cuda:0', grad_fn=)) to satisfy the constraint Real(), but found invalid values: tensor([[[nan]]], device='cuda:0', grad_fn=) Any idea how to fix this?

philtabor / Youtube-Code-Repository

SAC custom env #43