Outdoor data, label only belong to the category of people,The original data is consistent with the data type of S3DIS,Put the data to SoftGroup/dataset/s3dis/ folder,and do this
cd SoftGroup/dataset/s3dis
bash prepare_data.sh
Then run
./tools/dist_train.sh configs/softgroup/softgroup_s3dis_backbone_fold5.yaml 2 --skip_validate
Error generated:
2023-05-04 16:35:52,027 - INFO - Training
2023-05-04 16:36:23,085 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:36:24,014 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:36:24,214 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:36:27,134 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:36:28,194 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:36:36,094 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:36:50,126 - INFO - Reducer buckets have been rebuilt in this iteration.
2023-05-04 16:36:50,127 - INFO - Reducer buckets have been rebuilt in this iteration.
2023-05-04 16:36:54,860 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:37:01,948 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:37:03,597 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:37:04,010 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:37:04,346 - INFO - batch is truncated from size 6 to 1
2023-05-04 16:37:13,446 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:37:14,611 - INFO - Epoch [1/5][10/1921] lr: 0.002, eta: 22:00:35, mem: 2664, data_time: 0.00, iter_time: 0.32, semantic_loss: 0.2802, offset_loss: 0.1478, loss: 0.4279
2023-05-04 16:37:28,037 - INFO - batch is truncated from size 6 to 1
2023-05-04 16:37:35,358 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:37:35,713 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:37:38,783 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:37:45,560 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:38:03,211 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:38:11,835 - INFO - batch is truncated from size 6 to 1
2023-05-04 16:38:12,511 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:38:15,425 - INFO - batch is truncated from size 6 to 5
2023-05-04 16:38:16,396 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:38:21,079 - INFO - batch is truncated from size 6 to 5
2023-05-04 16:38:21,660 - INFO - Epoch [1/5][20/1921] lr: 0.002, eta: 19:55:10, mem: 2664, data_time: 0.00, iter_time: 0.27, semantic_loss: 0.0984, offset_loss: 0.1379, loss: 0.2363
2023-05-04 16:38:28,258 - INFO - batch is truncated from size 6 to 5
2023-05-04 16:38:48,085 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:38:50,249 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:38:51,320 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:38:52,730 - INFO - batch is truncated from size 6 to 5
2023-05-04 16:38:53,705 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:38:59,308 - INFO - Epoch [1/5][30/1921] lr: 0.002, eta: 16:36:12, mem: 2726, data_time: 0.00, iter_time: 0.17, semantic_loss: 0.0437, offset_loss: 0.1729, loss: 0.2166
2023-05-04 16:39:03,889 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:39:23,734 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:39:28,241 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:39:32,227 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:39:34,514 - INFO - batch is truncated from size 6 to 1
2023-05-04 16:39:34,959 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:39:36,738 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:40:02,695 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:40:04,603 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:40:08,027 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:40:09,246 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:40:09,419 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:40:10,507 - INFO - Epoch [1/5][40/1921] lr: 0.002, eta: 17:10:08, mem: 2726, data_time: 0.00, iter_time: 0.15, semantic_loss: 0.0257, offset_loss: 0.0000, loss: 0.0257
2023-05-04 16:40:14,869 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:40:35,827 - INFO - batch is truncated from size 6 to 5
2023-05-04 16:40:37,720 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:40:43,784 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:40:44,789 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:40:45,510 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:40:51,487 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:41:07,518 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:41:10,006 - INFO - batch is truncated from size 6 to 1
2023-05-04 16:41:19,536 - INFO - batch is truncated from size 6 to 1
2023-05-04 16:41:24,831 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:41:28,583 - INFO - Epoch [1/5][50/1921] lr: 0.002, eta: 17:51:55, mem: 2726, data_time: 0.00, iter_time: 0.20, semantic_loss: 0.0183, offset_loss: 0.1210, loss: 0.1393
2023-05-04 16:41:29,044 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:41:30,164 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:41:41,683 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:41:50,007 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:41:52,864 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:41:58,937 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:42:05,597 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:42:09,111 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:42:13,219 - INFO - Epoch [1/5][60/1921] lr: 0.002, eta: 16:50:37, mem: 2726, data_time: 0.00, iter_time: 0.13, semantic_loss: 0.0136, offset_loss: 0.0000, loss: 0.0136
2023-05-04 16:42:16,768 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:42:17,744 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:42:29,310 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:42:31,609 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:42:41,886 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:42:44,707 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:42:46,316 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:42:53,790 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:42:59,427 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:43:03,653 - INFO - batch is truncated from size 6 to 1
2023-05-04 16:43:14,362 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:43:14,458 - INFO - batch is truncated from size 6 to 3
2023-05-04 16:43:22,210 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:43:29,514 - INFO - Epoch [1/5][70/1921] lr: 0.002, eta: 17:18:35, mem: 2726, data_time: 0.00, iter_time: 0.15, semantic_loss: 0.0109, offset_loss: 0.0000, loss: 0.0109
2023-05-04 16:43:31,523 - INFO - batch is truncated from size 6 to 2
2023-05-04 16:43:40,498 - INFO - batch is truncated from size 6 to 5
2023-05-04 16:43:41,620 - INFO - batch is truncated from size 6 to 4
2023-05-04 16:43:55,737 - INFO - batch is truncated from size 6 to 3
Traceback (most recent call last):
File "./tools/train.py", line 206, in
main()
File "./tools/train.py", line 199, in main
train(epoch, model, optimizer, scaler, train_loader, cfg, logger, writer)
File "./tools/train.py", line 44, in train
for i, batch in enumerate(train_loader, start=1):
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/odrobot/workplace/zs_workplace/softgroup_outdoor/SoftGroup/softgroup/data/s3dis.py", line 82, in collate_fn
return super().collate_fn(batch)
File "/home/odrobot/workplace/zs_workplace/softgroup_outdoor/SoftGroup/softgroup/data/custom.py", line 222, in collate_fn
assert batch_id > 0, 'empty batch'
AssertionError: empty batch
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1180 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1179) of binary: /home/odrobot/anaconda3/envs/softgroup/bin/python
Traceback (most recent call last):
File "/home/odrobot/anaconda3/envs/softgroup/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, *kwargs)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
)(cmd_args)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
./tools/train.py FAILED
Failures:
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-05-04_16:44:04
host : odrobot
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1179)
error_file:
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I don't know where the problem is. I look forward to your help!Thank you very much!!!!
Description of data:
Outdoor data, label only belong to the category of people,The original data is consistent with the data type of S3DIS,Put the data to SoftGroup/dataset/s3dis/ folder,and do this cd SoftGroup/dataset/s3dis bash prepare_data.sh
Then change softgroup_s3dis_backbone_fold5.yaml : model: channels: 32 num_blocks: 7 semantic_classes: 2 instance_classes: 2 sem2ins_classes: [] semantic_only: True ignore_label: -100 grouping_cfg: score_thr: 0.2 radius: 0.04 mean_active: 300 class_numpoint_mean: [12210,39796] npoint_thr: 0.05 # absolute if class_numpoint == -1, relative if class_numpoint != -1 ignore_classes: [] instance_voxel_cfg: scale: 20 spatial_shape: 20 train_cfg: max_proposal_num: 200 pos_iou_thr: 0.5 test_cfg: x4_split: True cls_score_thr: 0.001 mask_score_thr: -0.5 min_npoint: 100 eval_tasks: ['semantic'] fixed_modules: []
data: train: type: 's3dis' data_root: 'dataset/s3dis/preprocess' prefix: ['Area_1', 'Area_2', 'Area_3', 'Area_4'] suffix: '_inst_nostuff.pth' repeat: 20 training: True voxel_cfg: scale: 20 spatial_shape: [128, 512] max_npoint: 250000 min_npoint: 5000 test: type: 's3dis' data_root: 'dataset/s3dis/preprocess' prefix: 'Area_5' suffix: '_inst_nostuff.pth' training: False voxel_cfg: scale: 20 spatial_shape: [128, 512] max_npoint: 250000 min_npoint: 5000
dataloader: train: batch_size: 6 num_workers: 6 test: batch_size: 1 num_workers: 1
optimizer: type: 'Adam' lr: 0.002
fp16: False epochs: 5 step_epoch: 0 save_freq: 2 pretrain: './hais_ckpt_spconv2.pth' work_dir: ''
Then run ./tools/dist_train.sh configs/softgroup/softgroup_s3dis_backbone_fold5.yaml 2 --skip_validate Error generated: 2023-05-04 16:35:52,027 - INFO - Training 2023-05-04 16:36:23,085 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:36:24,014 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:36:24,214 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:36:27,134 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:36:28,194 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:36:36,094 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:36:50,126 - INFO - Reducer buckets have been rebuilt in this iteration. 2023-05-04 16:36:50,127 - INFO - Reducer buckets have been rebuilt in this iteration. 2023-05-04 16:36:54,860 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:37:01,948 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:37:03,597 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:37:04,010 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:37:04,346 - INFO - batch is truncated from size 6 to 1 2023-05-04 16:37:13,446 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:37:14,611 - INFO - Epoch [1/5][10/1921] lr: 0.002, eta: 22:00:35, mem: 2664, data_time: 0.00, iter_time: 0.32, semantic_loss: 0.2802, offset_loss: 0.1478, loss: 0.4279 2023-05-04 16:37:28,037 - INFO - batch is truncated from size 6 to 1 2023-05-04 16:37:35,358 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:37:35,713 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:37:38,783 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:37:45,560 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:38:03,211 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:38:11,835 - INFO - batch is truncated from size 6 to 1 2023-05-04 16:38:12,511 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:38:15,425 - INFO - batch is truncated from size 6 to 5 2023-05-04 16:38:16,396 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:38:21,079 - INFO - batch is truncated from size 6 to 5 2023-05-04 16:38:21,660 - INFO - Epoch [1/5][20/1921] lr: 0.002, eta: 19:55:10, mem: 2664, data_time: 0.00, iter_time: 0.27, semantic_loss: 0.0984, offset_loss: 0.1379, loss: 0.2363 2023-05-04 16:38:28,258 - INFO - batch is truncated from size 6 to 5 2023-05-04 16:38:48,085 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:38:50,249 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:38:51,320 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:38:52,730 - INFO - batch is truncated from size 6 to 5 2023-05-04 16:38:53,705 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:38:59,308 - INFO - Epoch [1/5][30/1921] lr: 0.002, eta: 16:36:12, mem: 2726, data_time: 0.00, iter_time: 0.17, semantic_loss: 0.0437, offset_loss: 0.1729, loss: 0.2166 2023-05-04 16:39:03,889 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:39:23,734 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:39:28,241 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:39:32,227 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:39:34,514 - INFO - batch is truncated from size 6 to 1 2023-05-04 16:39:34,959 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:39:36,738 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:40:02,695 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:40:04,603 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:40:08,027 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:40:09,246 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:40:09,419 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:40:10,507 - INFO - Epoch [1/5][40/1921] lr: 0.002, eta: 17:10:08, mem: 2726, data_time: 0.00, iter_time: 0.15, semantic_loss: 0.0257, offset_loss: 0.0000, loss: 0.0257 2023-05-04 16:40:14,869 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:40:35,827 - INFO - batch is truncated from size 6 to 5 2023-05-04 16:40:37,720 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:40:43,784 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:40:44,789 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:40:45,510 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:40:51,487 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:41:07,518 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:41:10,006 - INFO - batch is truncated from size 6 to 1 2023-05-04 16:41:19,536 - INFO - batch is truncated from size 6 to 1 2023-05-04 16:41:24,831 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:41:28,583 - INFO - Epoch [1/5][50/1921] lr: 0.002, eta: 17:51:55, mem: 2726, data_time: 0.00, iter_time: 0.20, semantic_loss: 0.0183, offset_loss: 0.1210, loss: 0.1393 2023-05-04 16:41:29,044 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:41:30,164 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:41:41,683 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:41:50,007 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:41:52,864 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:41:58,937 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:42:05,597 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:42:09,111 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:42:13,219 - INFO - Epoch [1/5][60/1921] lr: 0.002, eta: 16:50:37, mem: 2726, data_time: 0.00, iter_time: 0.13, semantic_loss: 0.0136, offset_loss: 0.0000, loss: 0.0136 2023-05-04 16:42:16,768 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:42:17,744 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:42:29,310 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:42:31,609 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:42:41,886 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:42:44,707 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:42:46,316 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:42:53,790 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:42:59,427 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:43:03,653 - INFO - batch is truncated from size 6 to 1 2023-05-04 16:43:14,362 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:43:14,458 - INFO - batch is truncated from size 6 to 3 2023-05-04 16:43:22,210 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:43:29,514 - INFO - Epoch [1/5][70/1921] lr: 0.002, eta: 17:18:35, mem: 2726, data_time: 0.00, iter_time: 0.15, semantic_loss: 0.0109, offset_loss: 0.0000, loss: 0.0109 2023-05-04 16:43:31,523 - INFO - batch is truncated from size 6 to 2 2023-05-04 16:43:40,498 - INFO - batch is truncated from size 6 to 5 2023-05-04 16:43:41,620 - INFO - batch is truncated from size 6 to 4 2023-05-04 16:43:55,737 - INFO - batch is truncated from size 6 to 3 Traceback (most recent call last): File "./tools/train.py", line 206, in
main()
File "./tools/train.py", line 199, in main
train(epoch, model, optimizer, scaler, train_loader, cfg, logger, writer)
File "./tools/train.py", line 44, in train
for i, batch in enumerate(train_loader, start=1):
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/_utils.py", line 457, in reraise
raise exception
AssertionError: Caught AssertionError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/odrobot/workplace/zs_workplace/softgroup_outdoor/SoftGroup/softgroup/data/s3dis.py", line 82, in collate_fn
return super().collate_fn(batch)
File "/home/odrobot/workplace/zs_workplace/softgroup_outdoor/SoftGroup/softgroup/data/custom.py", line 222, in collate_fn
assert batch_id > 0, 'empty batch'
AssertionError: empty batch
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1180 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1179) of binary: /home/odrobot/anaconda3/envs/softgroup/bin/python Traceback (most recent call last): File "/home/odrobot/anaconda3/envs/softgroup/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.11.0', 'console_scripts', 'torchrun')())
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, *kwargs)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
)(cmd_args)
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/odrobot/anaconda3/envs/softgroup/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
./tools/train.py FAILED
Failures: