mit-han-lab / spvnas

[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
http://spvnas.mit.edu/
MIT License
588 stars 109 forks source link

Error during executing evaluate.py #73

Closed Nireil closed 3 years ago

Nireil commented 3 years ago

When I run torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs then I got following error messages [2021-08-30 17:23:50.180] /usr/local/anaconda3/envs/torch/bin/python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs [2021-08-30 17:23:50.181] Experiment started: "runs/run-98ebafa2-a0dc3bdc". workers_per_gpu: 8 data: num_classes: 19 ignore_label: 255 training_size: 19132 train: seed: 1588147245 deterministic: False dataset: name: semantic_kitti root: /dataset/semantic-kitti num_points: 80000 voxel_size: 0.05 num_epochs: 15 batch_size: 2 criterion: name: cross_entropy ignore_index: 255 optimizer: name: sgd lr: 0.24 weight_decay: 0.0001 momentum: 0.9 nesterov: True scheduler: name: cosine_warmup Traceback (most recent call last): File "evaluate.py", line 130, in main() File "evaluate.py", line 62, in main model = spvnas_specialized(args.name) File "/home/pjy/spvnas/model_zoo.py", line 51, in spvnas_specialized if torch.cuda.is_available() else 'cpu')['model'] File "/usr/local/anaconda3/envs/torch/lib/python3.6/site-packages/torch/serialization.py", line 587, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/usr/local/anaconda3/envs/torch/lib/python3.6/site-packages/torch/serialization.py", line 242, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:

Process name: [[59182,1],0] Exit code: 1

but I can run torchpack dist-run -np [num_of_gpus] python train.py configs/semantic_kitti/spvcnn/cr0p5.yaml successfully, and i got the best mIoU of 59.466 on one GTX 1080Ti GPU