lzccccc / SMOKE

SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
MIT License
696 stars 177 forks source link

TypeError: zip argument #1 must support iteration #87

Open LordonCN opened 1 year ago

LordonCN commented 1 year ago

Hello, When I use command python tools/plain_train_net.py --config-file configs/train_val_bs16_normal_conv.yaml the training stage is fine, but when I try to use multi gpus to train it occus:

python tools/plain_train_net.py --config-file configs/train_val_bs16_normal_conv.yaml --num-gpus 2 --num-machines 1

` -02 20:15:24,729] smoke.data.datasets.kitti INFO: Initializing KITTI train set with 3712 files loaded [2023-03-02 20:15:24,775] smoke.trainer INFO: Start training Traceback (most recent call last): File "tools/plain_train_net.py", line 107, in args=(args,), File "/home/wangguojun//test/SMOKE/smoke/engine/launch.py", line 53, in launch daemon=False, File "/home/wangguojun/miniconda3/envs/smoke/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/wangguojun/miniconda3/envs/smoke/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/home/wangguojun/miniconda3/envs/smoke/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/wangguojun/miniconda3/envs/smoke/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/home/wangguojun//test/SMOKE/smoke/engine/launch.py", line 88, in _distributed_worker main_func(args) File "/home/wangguojun//test/SMOKE/tools/plain_train_net.py", line 95, in main train(cfg, model, device, distributed) File "/hoe/wangguojun//test/SMOKE/tools/plain_train_net.py", line 57, in train tb_log File "/home/wangguojun//test/SMOKE/smoke/engine/trainer.py", line 73, in do_train for data, iteration in zip(data_loader, range(start_iter, max_iter)):

TypeError: zip argument #1 must support iteration

(smoke) wangguojun@pc:~//test/SMOKE$ Traceback (most recent call last): File "", line 1, in File "/home/wangguojun/miniconda3/envs/smoke/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main exitcode = _main(fd) File "/home/wangguojun/miniconda3/envs/smoke/lib/python3.6/multiprocessing/spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) _pickle.UnpicklingError: pickle data was truncated /home/wangguojun/miniconda3/envs/smoke/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown len(cache)) `

1gjjuser1 commented 1 month ago

你可以尝试清理一下缓存,并且停掉全部python进程,然后重新训练,就好了。