Errors on Towards Distraction-Robust Active Visual Tracking

VickyCas commented 2 years ago

Hi @zfw1226 ，when i run your code https://github.com/zfw1226/active_tracking_rl/tree/distractor, i met some errors：

python main.py --model simple-pos-act-lstm --tracker none --env UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --env-base UnrealTrackMulti-FlexibleRoomAdv-DiscreteColor-v1 --rnn-out 128 --seed 4 --seed-test 2 --train-mode -1 --test-eps 25 --norm-reward --aux reward --lr 0.001 --gpu-id 0 Running docker-free env, pid:1843381 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env share memory 2022-04-24 21:24:25,504 : lr: 0.001 2022-04-24 21:24:25,506 : early_done: False [Errno 98] Address already in use Port=9013 Running docker-free env, pid:1843459 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9014 Running docker-free env, pid:1843533 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9015 Running docker-free env, pid:1843609 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9016 Running docker-free env, pid:1843689 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env [Errno 98] Address already in use Port=9017 Running docker-free env, pid:1843764 Please wait for a while to launch env...... exec nohup /home/dell/tracker/gym-unrealcv/gym_unrealcv/envs/UnrealEnv/FlexibleRoom/trackingroom2/Binaries/Linux/trackingroom2 nohup: ignoring input and appending output to 'nohup.out' 127.0.0.1 INFO:init:192:Got connection confirm: b'connected to trackingroom2' build env Process Process-7: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt Process Process-6: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, *self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt Process Process-5: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(self._args, self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt Process Process-4: Traceback (most recent call last): File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/dell/anaconda3/envs/act/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/dell/Active_tracking_rl/train.py", line 56, in train player.model = player.model.to(device) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 432, in to return self._apply(convert) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/module.py", line 208, in _apply module._apply(fn) File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 124, in _apply self.flatten_parameters() File "/home/dell/anaconda3/envs/act/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 120, in flatten_parameters self.batch_first, bool(self.bidirectional)) KeyboardInterrupt 2022-04-24 21:58:16,966 : Time 00h 33m 39s, ave eps reward [-1539.65 1534.09], ave eps length 500.0, reward step [-3.08 3.07], FPS 6.55 2022-04-24 22:32:15,107 : Time 01h 07m 38s, ave eps reward [-1464.43 1459.59], ave eps length 500.0, reward step [-2.93 2.92], FPS 6.58

At the same time, i find the agent in the screen doesn't move . Can you give me some advice?

zfw1226 commented 2 years ago

If you find only the agents in the first launched screen do not move, it is OK.
Can you report the version of your PyTorch and GPU? I guess your crash may be caused by the out-of-memory of the GPU. You can set the --gpu-id to -1 to deploy models on the CPUs.

VickyCas commented 2 years ago

If you find only the agents in the first launched screen do not move, it is OK.

Can you report the version of your PyTorch and GPU? I guess your crash may be caused by the out-of-memory of the GPU. You can set the --gpu-id to -1 to deploy models on the CPUs.

Thank you for your reply! The version is below：

print(torch.version) 1.2.0

nvidia-smi -L GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-1b5eb693-4789-060a-f54f-84d6f11d2b44)

zfw1226 commented 2 years ago

Please try the newer version of PyTorch (e.g., 1.7 with CUDA 11), as RTX 3090 does not support the old version of CUDA.

zfw1226 / gym-unrealcv

Errors on Towards Distraction-Robust Active Visual Tracking #26