quantumiracle / Popular-RL-Algorithms

PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT-Opt, PointNet..
Apache License 2.0
1.14k stars 129 forks source link

NameError: name 'last_action' is not defined #42

Closed Nick-Kou closed 3 years ago

Nick-Kou commented 3 years ago

Hi,

After training sac_lstm by running python3 sac_v2_lstm.py --train , I then attempted to test. Upon testing, I received an error stating name "last_action is not defined". More specifically:

gym/envs/registration.py:14: PkgResourcesDeprecationWarning: Parameters to load are deprecated.  Call .resolve and .require separately.
  result = entry_point.load(False)
Soft Q Network (1,2):  QNetworkLSTM(
  (linear1): Linear(in_features=4, out_features=512, bias=True)
  (linear2): Linear(in_features=4, out_features=512, bias=True)
  (lstm1): LSTM(512, 512)
  (linear3): Linear(in_features=1024, out_features=512, bias=True)
  (linear4): Linear(in_features=512, out_features=1, bias=True)
)
Policy Network:  SAC_PolicyNetworkLSTM(
  (linear1): Linear(in_features=3, out_features=512, bias=True)
  (linear2): Linear(in_features=4, out_features=512, bias=True)
  (lstm1): LSTM(512, 512)
  (linear3): Linear(in_features=1024, out_features=512, bias=True)
  (linear4): Linear(in_features=512, out_features=512, bias=True)
  (mean_linear): Linear(in_features=512, out_features=1, bias=True)
  (log_std_linear): Linear(in_features=512, out_features=1, bias=True)
)
Traceback (most recent call last):
  File "sac_v2_lstm.py", line 308, in <module>
    action, hidden_out = sac_trainer.policy_net.get_action(state, last_action, hidden_in, deterministic = DETERMINISTIC)
NameError: name 'last_action' is not defined

For now I added last_action = env.action_space.sample() in line 302 after the else statement, however, I am unsure if this is correct.

Lastly, I just had a question with regards to the "NormalizedActions" class in line 50 sac_v2_lstm.py. How will this be useful if the action space is for example [-1,+1] but normalisation scales to [0,1], is there any reference text for this motivation?

Thanks

quantumiracle commented 3 years ago

Hi,

"last_action is not defined"

Thanks for pointing out. In test, adding last_action = env.action_space.sample() should work, just as in training.

NormalizedActions

I think the normalisation is to make the standard policy output from [-1, 1] to [action_space.low, action_space.high].