2023-09-20 10:02:27,882 - mmdet3d - INFO - workflow: [('train', 1)], max: 6 epochs
2023-09-20 10:02:27,884 - mmdet3d - INFO - Checkpoints will be saved to /home/wangweilin/wwl/if/bev/bevfusion/runs/run-601961a9-b1280aba by HardDiskBackend.
Traceback (most recent call last):
File "tools/train.py", line 87, in
main()
File "tools/train.py", line 76, in main
train_model(
File "/home/wangweilin/wwl/if/bev/bevfusion/mmdet3d/apis/train.py", line 126, in train_model
runner.run(data_loaders, [("train", 1)])
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 108, in run
self.call_hook('before_run')
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/dist_utils.py", line 94, in wrapper
return func(*args, **kwargs)
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/hooks/logger/tensorboard.py", line 35, in before_run
from torch.utils.tensorboard import SummaryWriter
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/torch/utils/tensorboard/init.py", line 4, in
LooseVersion = distutils.version.LooseVersion
AttributeError: module 'distutils' has no attribute 'version'
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
I train data output error.
cmd: torchpack dist-run -np 2 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth
error: after_run: (VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook
2023-09-20 10:02:27,882 - mmdet3d - INFO - workflow: [('train', 1)], max: 6 epochs 2023-09-20 10:02:27,884 - mmdet3d - INFO - Checkpoints will be saved to /home/wangweilin/wwl/if/bev/bevfusion/runs/run-601961a9-b1280aba by HardDiskBackend. Traceback (most recent call last): File "tools/train.py", line 87, in
main()
File "tools/train.py", line 76, in main
train_model(
File "/home/wangweilin/wwl/if/bev/bevfusion/mmdet3d/apis/train.py", line 126, in train_model
runner.run(data_loaders, [("train", 1)])
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 108, in run
self.call_hook('before_run')
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/dist_utils.py", line 94, in wrapper
return func(*args, **kwargs)
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/mmcv/runner/hooks/logger/tensorboard.py", line 35, in before_run
from torch.utils.tensorboard import SummaryWriter
File "/home/wangweilin/anaconda3/envs/bevfusion_mit/lib/python3.8/site-packages/torch/utils/tensorboard/init.py", line 4, in
LooseVersion = distutils.version.LooseVersion
AttributeError: module 'distutils' has no attribute 'version'
Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[9487,1],0] Exit code: 1