Closed yzbx closed 1 year ago
@yzbx Thank you very much for your feedback, we will look into it. I'll let you know if there's any progress.
@yzbx This is because the GPU intermediate data self.results
of the multi-GPU will be synchronized on the CPU, and your evaluation data set is relatively large, resulting in insufficient CPU memory. A simple solution is to reduce the threshold parameter of test_cfg
, such as
test_cfg=dict(
multi_label=True,
nms_pre=30000,
score_thr=0.001, -> 0.1
nms=dict(type='nms', iou_threshold=0.65),
max_per_img=300)) -> 100
@hhaAndroid thanks.
@yzbx There is an additional reference option https://github.com/ultralytics/yolov3/issues/796
Prerequisite
🐞 Describe the bug
training with
bash ./tools/dist_train.sh configs/yolov5/yolov5_s-v61_syncbn_8xb16-100e_object365v2.py 8
Evaluation on large dataset, like object365v2, (validataion dataset size = 80000 images), after evaluation, code
torch_dist.broadcast_object_list(data, src, group)
cause RuntimeErrorEnvironment
sys.platform: linux
Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce GTX 1080 Ti
CUDA_HOME: None
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
TorchVision: 0.12.0
OpenCV: 4.7.0
MMEngine: 0.5.0
MMCV: 2.0.0rc3
MMDetection: 3.0.0rc5
MMYOLO: 0.4.0+
Additional information