intelligent-environments-lab / CityLearn

Official reinforcement learning environment for demand response and load shaping
MIT License
462 stars 167 forks source link

[BUG] SAC Agent normalization AssertionError #59

Closed lijiayi9712 closed 1 year ago

lijiayi9712 commented 1 year ago

Issue Description

When i run: " from citylearn.citylearn import CityLearnEnv from citylearn.agents.sac import SAC as RLAgent

dataset_name = 'baeda_3dem' env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=10) model = RLAgent(env) model.learn(episodes=2, deterministic_finish=True) "

I get this: " obs: [-2.44929360e-16 1.00000000e+00 1.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 5.40640817e-01 8.41253533e-01 2.39775427e-01 -1.93590194e-08 6.18397155e-01 3.73208446e-01 0.00000000e+00 5.10551796e-01 0.00000000e+00 0.00000000e+00 4.57856874e-09 4.57856874e-09 4.57856874e-09 4.57856874e-09 4.95913909e-01 0.00000000e+00 0.00000000e+00 3.75607514e-01] mean: None std: None " with the trace back: "--------------------------------------------------------------------------- TypeError Traceback (most recent call last) ~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_normalized_observations(self, index, observations) 230 try: --> 231 return (np.array(observations, dtype = float) - self.norm_mean[index])/self.norm_std[index] 232 except:

TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

During handling of the above exception, another exception occurred:

AssertionError Traceback (most recent call last)

in 5 env = CityLearnEnv(dataset_name, central_agent=False, simulation_end_time_step=10) 6 model = RLAgent(env) ----> 7 model.learn(episodes=2, deterministic_finish=True) 8 ~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/base.py in learn(self, episodes, keep_env_history, env_history_directory, deterministic, deterministic_finish, logging_level) 139 140 while not self.env.done: --> 141 actions = self.predict(observations, deterministic=deterministic) 142 143 # apply actions to citylearn_env ~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in predict(self, observations, deterministic) 188 189 if self.time_step > self.end_exploration_time_step or deterministic: --> 190 actions = self.get_post_exploration_prediction(observations, deterministic) 191 192 else: ~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_post_exploration_prediction(self, observations, deterministic) 204 for i, o in enumerate(observations): 205 o = self.get_encoded_observations(i, o) --> 206 o = self.get_normalized_observations(i, o) 207 o = torch.FloatTensor(o).unsqueeze(0).to(self.device) 208 result = self.policy_net[i].sample(o) ~/opt/anaconda3/lib/python3.8/site-packages/citylearn/agents/sac.py in get_normalized_observations(self, index, observations) 236 print('std:',self.norm_std[index]) 237 print(self.time_step, self.standardize_start_time_step, self.batch_size, len(self.replay_buffer[0])) --> 238 assert False 239 240 def get_encoded_observations(self, index: int, observations: List[float]) -> npt.NDArray[np.float64]: AssertionError: " ## Expected Behavior Please describe what you expected to happen. ## Actual Behavior Please describe what actually happened. ## Steps to Reproduce Please provide detailed steps to reproduce the issue. ## Environment - CityLearn version: 2.0b2 - Operating System: macOS - Python version: 3.8 ## Possible Solution If you have any ideas for how to fix the issue, please describe them here. ## Additional Notes Please provide any additional information that may be helpful in resolving this issue.
kingsleynweye commented 1 year ago

@lijiayi9712 The reason you are getting None values for mean and standard deviation and the consequent assertion error is because the condition for eventually calculating those values are not met when your episode has only 10 time steps.

The condition is if self.time_step >= self.standardize_start_time_step and self.batch_size <= len(self.replay_buffer[i]) (see in code).

So, to solve the error, you need to make sure that by the time you exceed end_exploration_time_step, you have collected enough samples in the replay buffer that it is less than or equal to batch_size (or you reduce the batch size to something < 10). You also want to make sure that self.time_step >= self.standardize_start_time_step evaluates to True at that time.

lijiayi9712 commented 1 year ago

Thanks! I modified the steps and it worked!