jzhzhang / 3DAwareNav

[CVPR 2023] We propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation.
MIT License
57 stars 3 forks source link

NameError: name 'g_policy' is not defined BrokenPipeError: [Errno 32] Broken pipe #3

Closed peakfly closed 1 year ago

peakfly commented 1 year ago

❓ Questions and Help When I trained on the 2xRTX3090,the error occurred after 01d 02h 15m 29s. Is there a problem with my Settings? The file sh_train_mp3d.sh is as follows.

export GLOG_minloglevel=2
export MAGNUM_LOG="quiet"

python main.py --auto_gpu_config 0  -n 4 \
    --sem_gpu_id_list "1"  --policy_gpu_id "cuda:2"  --sim_gpu_id "1" \
    --split train  --backbone_2d "rednet"  \
    --task_config "tasks/challenge_objectnav2021.local.rgbd.yaml"  --dataset "mp3d" \
    --num_sem_categories 22 --deactivate_entropymap \
    --print_images 1  -d ./tmp  --exp_name exp_kl_goal  --save_periodic 10000 

The Error occurs

[swscaler @ 0x73b0d80] Warning: data is not aligned! This can lead to a speed loss
Time: 01d 02h 15m 15s, num timesteps 260720, FPS 2,
        Rewards: Global step mean/med rew: 0.2068/0.1116,  Global eps mean/med/min/max eps rew: 1.868/1.459/-0.000/5.760, ObjectNav succ/spl/dtg: 0.331/0.123/5.455(0),
        Losses: Policy explore Loss value/action/dist: 0.535/-0.005/1.033, Policy identify Loss value/action/dist: 0.545/-0.007/1.406,
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (1165, 800) to (1168, 800) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
[swscaler @ 0x6f6cd80] Warning: data is not aligned! This can lead to a speed loss
Time: 01d 02h 15m 29s, num timesteps 260760, FPS 2,
        Rewards: Global step mean/med rew: 0.2068/0.1116,  Global eps mean/med/min/max eps rew: 1.868/1.459/-0.000/5.760, ObjectNav succ/spl/dtg: 0.332/0.124/5.432(0),
        Losses: Policy explore Loss value/action/dist: 0.535/-0.005/1.033, Policy identify Loss value/action/dist: 0.545/-0.007/1.406,
Traceback (most recent call last):
  File "main.py", line 1029, in <module>
    main()
  File "main.py", line 960, in main
    torch.save(g_policy.state_dict(),
NameError: name 'g_policy' is not defined
Exception ignored in: <function VectorEnv.__del__ at 0x7efb8002dd40>
Traceback (most recent call last):
  File "/mnt/A/amax/3DAwareNav/envs/habitat/utils/vector_env.py", line 560, in __del__
    self.close()
  File "/mnt/A/amax/3DAwareNav/envs/habitat/utils/vector_env.py", line 427, in close
    write_fn((CLOSE_COMMAND, None))
  File "/mnt/A/amax/anaconda3/envs/ObjectNav/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/mnt/A/amax/anaconda3/envs/ObjectNav/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/mnt/A/amax/anaconda3/envs/ObjectNav/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
peakfly commented 1 year ago

Oh,"g_policy"and"g_policy_3d" are not defined in the main.py. Is this a bug?

    # Save best models
        if (step * num_scenes) % args.save_interval < \
                num_scenes:
            if len(g_episode_rewards) >= 1000 and \
                    (np.mean(g_episode_rewards) >= best_g_reward) \
                    and not args.eval:
                torch.save(g_policy.state_dict(),
                           os.path.join(log_dir, "model_best_explore.pth"))

                torch.save(g_policy_3d.state_dict(),
                           os.path.join(log_dir, "model_best_identify.pth"))
                best_g_reward = np.mean(g_episode_rewards)
jzhzhang commented 1 year ago

Sorry for the mistake. It looks like a bug remained during the finalization. The code is revised here