zfw1226 / gym-unrealcv

Unreal environments for reinforcement learning
Apache License 2.0
367 stars 70 forks source link

Errors on Towards Distraction-Robust Active Visual Tracking #26

Closed VickyCas closed 1 year ago

VickyCas commented 2 years ago

Hi @zfw1226 ,when i run your code https://github.com/zfw1226/active_tracking_rl/tree/distractor, i met some errors:

python main.py --model simple-pos-act-lstm --tracker none --env UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --env-base UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --rnn-out 128 --seed 4 --seed-test 2 --train-mode -1 --test-eps 25 --norm-reward --aux reward --lr 0.001 --gpu-id 0 Running docker-free env, pid:1843381 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env share memory 2022-04-24 21:24:25,504 : lr: 0.001 2022-04-24 21:24:25,506 : early_done: False [Errno 98] Address already in use Port=9013 Running docker-free env, pid:1843459 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9014 Running docker-free env, pid:1843533 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9015 Running docker-free env, pid:1843609 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9016 Running docker-free env, pid:1843689 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9017 Running docker-free env, pid:1843764 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env Process Process-7: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt Process Process-6: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, *self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt Process Process-5: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(self._args, self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt Process Process-4: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt 2022-04-24 21:58:16,966 : Time 00h 33m 39s, ave eps reward [-1539.65 1534.09], ave eps length 500.0, reward step [-3.08 3.07], FPS 6.55 2022-04-24 22:32:15,107 : Time 01h 07m 38s, ave eps reward [-1464.43 1459.59], ave eps length 500.0, reward step [-2.93 2.92], FPS 6.58

At the same time, i find the agent in the screen doesn't move . Can you give me some advice?

zfw1226 commented 2 years ago
  1. If you find only the agents in the first launched screen do not move, it is OK.
  2. Can you report the version of your PyTorch and GPU? I guess your crash may be caused by the out-of-memory of the GPU. You can set the --gpu-id to -1 to deploy models on the CPUs.
VickyCas commented 2 years ago
  1. If you find only the agents in the first launched screen do not move, it is OK.
  2. Can you report the version of your PyTorch and GPU? I guess your crash may be caused by the out-of-memory of the GPU. You can set the --gpu-id to -1 to deploy models on the CPUs.

Thank you for your reply! The version is below:

print(torch.version) 1.2.0

nvidia-smi -L GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-1b5eb693-4789-060a-f54f-84d6f11d2b44)

zfw1226 commented 2 years ago

Please try the newer version of PyTorch (e.g., 1.7 with CUDA 11), as RTX 3090 does not support the old version of CUDA.