heleidsn / UAV_Navigation_DRL_AirSim

This is a new repo used for training UAV navigation (local path planning) policy using DRL methods.
199 stars 29 forks source link

采用SAC和NOCNN训练一段时间报错 #21

Closed zyian0505 closed 1 year ago

zyian0505 commented 1 year ago

| rollout/ | | | crash_rate | 1 | | crash_rate_20 | 1 | | ep_len_mean | 1 | | ep_rew | -20 | | ep_rew_mean | -20 | | success_rate | 0 | | success_rate_20 | 0 | | time/ | | | episodes | 996 | | fps | 5 | | time_elapsed | 168 | | total_timesteps | 996 |

{'is_success': False, 'is_crash': True, 'is_not_in_workspace': False, 'step_num': 0} {'is_success': False, 'is_crash': True, 'is_not_in_workspace': False, 'step_num': 0} {'is_success': False, 'is_crash': True, 'is_not_in_workspace': False, 'step_num': 0} {'is_success': False, 'is_crash': True, 'is_not_in_workspace': False, 'step_num': 0}

| rollout/ | | | crash_rate | 1 | | crash_rate_20 | 1 | | ep_len_mean | 1 | | ep_rew | -20 | | ep_rew_mean | -20 | | success_rate | 0 | | success_rate_20 | 0 | | time/ | | | episodes | 1000 | | fps | 5 | | time_elapsed | 169 | | total_timesteps | 1000 |

Traceback (most recent call last): File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\scripts\utils\thread_train.py", line 260, in run model.learn(total_timesteps) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\sac\sac.py", line 299, in learn return super(SAC, self).learn( File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\common\off_policy_algorithm.py", line 354, in learn rollout = self.collect_rollouts( File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\common\off_policy_algorithm.py", line 588, in collect_rollouts actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\common\off_policy_algorithm.py", line 415, in _sample_action unscaledaction, = self.predict(self._last_obs, deterministic=False) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\common\base_class.py", line 573, in predict return self.policy.predict(observation, state, episode_start, deterministic) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\common\policies.py", line 338, in predict actions = self._predict(observation, deterministic=deterministic) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\sac\policies.py", line 359, in _predict return self.actor(observation, deterministic) File "C:\Users\Administrator\anaconda3\envs\airsim\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\sac\policies.py", line 175, in forward mean_actions, log_std, kwargs = self.get_action_dist_params(obs) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\sac\policies.py", line 162, in get_action_dist_params features = self.extract_features(obs) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\stable-baselines3\stable_baselines3\common\policies.py", line 129, in extract_features return self.features_extractor(preprocessed_obs) File "C:\Users\Administrator\anaconda3\envs\airsim\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "C:\Users\Administrator\Desktop\UAV_Navigation_DRL_AirSim\scripts\utils\custom_policy_sb3.py", line 65, in forward depth_img = observations[:, 0:1, :, :] IndexError: too many indices for tensor of dimension 3

zyian0505 commented 1 year ago

配置文件如下: [options] env_name = NH_center dynamic_name = Multirotor

navigation_3d = True using_velocity_state = False reward_type = reward_final

;depth, lgmd, vector perception = vector

algo = SAC total_timesteps = 200000

policy_name = No_CNN net_arch = [128, 64, 32 ,16] activation_function = tanh ;cnn_feature_num = 25 cnn_feature_num = 5

keyboard_debug = False generate_q_map = True q_map_save_steps = 5000

use_wandb = False

;use_wandb = True ;wandb_run_name = Maze-2D-mlp-tanh-M3 ;notes = test

[wandb] name = Maze-3D-No_CNN-tanh-M3 notes = test

[environment] max_depth_meters = 20 screen_height = 80 screen_width = 100

crash_distance = 2 accept_radius = 2

[multirotor] dt = 0.1 acc_xy_max = 2.0 v_xy_max = 5 v_xy_min = 0.5 v_z_max = 2.0 yaw_rate_max_deg = 30.0

; configs for DRL algorithms [DRL] gamma = 0.99 learning_rate = 1e-3 learning_starts = 1000 buffer_size = 50000 batch_size = 128 train_freq = 100 gradient_steps = 100 action_noise_sigma = 0.1

zyian0505 commented 1 year ago

发现问题了,是需要对airsim_env进行get_obs函数修改,这样才不会忽略图像信息

zyian0505 commented 1 year ago

但是当我修改配置文件的perception = vector改成perception = depth又出现了AttributeError: 'NoneType' object has no attribute 'actor'问题

vechi0324 commented 11 months ago

想请问您这个问题解决了吗?我在进行SAC和NoCNN配置时,在训练刚开始时就会报dimension的错误,请问该如何解决呢? image

vechi0324 commented 11 months ago

想请问您这个问题解决了吗?我在进行SAC和NoCNN配置时,在训练刚开始时就会报dimension的错误,请问该如何解决呢? image

配置文件如下: [options] project_name = FlappingWing env_name = SimpleAvoid dynamic_name = Multirotor

navigation_3d = True using_velocity_state = False reward_type = reward_final

;depth, lgmd, vector perception = vector

algo = SAC total_timesteps = 200000

policy_name = No_CNN net_arch = [128, 64, 32 ,16] activation_function = tanh ;cnn_feature_num = 25 cnn_feature_num = 5

keyboard_debug = False generate_q_map = True q_map_save_steps = 5000

use_wandb = False

;use_wandb = True ;wandb_run_name = Maze-2D-mlp-tanh-M3 ;notes = test

[wandb] name = Maze-3D-No_CNN-tanh-M3 notes = test

[environment] max_depth_meters = 20 screen_height = 80 screen_width = 100

crash_distance = 2 accept_radius = 2

[multirotor] dt = 0.1 acc_xy_max = 2.0 v_xy_max = 5 v_xy_min = 0.5 v_z_max = 2.0 yaw_rate_max_deg = 30.0

; configs for DRL algorithms [DRL] gamma = 0.99 learning_rate = 1e-3 learning_starts = 1000 buffer_size = 50000 batch_size = 128 train_freq = 100 gradient_steps = 100 action_noise_sigma = 0.1