open-mmlab / mmyolo

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.
https://mmyolo.readthedocs.io/zh_CN/dev/
GNU General Public License v3.0
2.83k stars 523 forks source link

On custom datasets, training ppyoloe+ GPU stucks #758

Open wang002 opened 1 year ago

wang002 commented 1 year ago

Prerequisite

🐞 Describe the bug

On custom datasets, yolov5 and rtndet can be trained.just ppyoloe+ can't train. my config file: base = './configs/ppyoloe/ppyoloe_plus_l_fast_8xb8-80e_coco.py' data_root = '/data/'

Path of train annotation file

train_ann_file = 'annotations/train.json' train_data_prefix = 'images/' # Prefix of train image path

Path of val annotation file

val_ann_file = 'annotations/val.json' val_data_prefix = 'images/' # Prefix of val image path class_name = ("a","b") num_classes = len(class_name) # Number of classes for classification

Batch size of a single GPU during training

train_batch_size_per_gpu = 16

Worker to pre-fetch data for each single GPU during training

train_num_workers = 10

persistent_workers must be False if num_workers is 0.

persistent_workers = True metainfo = dict( classes=class_name, palette=[(220, 20, 60)]
) model = dict(bbox_head=dict(head_module=dict(num_classes=num_classes)), train_cfg=dict(initial_assigner=dict(num_classes=num_classes),assigner=dict(num_classes=num_classes)) ) train_dataloader = dict( batch_size=train_batch_size_per_gpu, num_workers=train_num_workers, persistent_workers=persistent_workers, dataset=dict( data_root=data_root, ann_file=train_ann_file, metainfo=metainfo, data_prefix=dict(img=train_data_prefix), filter_cfg=dict(filter_empty_gt=True, min_size=32), ) )

val_dataloader = dict( dataset=dict( metainfo=metainfo, data_root=data_root, ann_file=val_ann_file, data_prefix=dict(img=val_data_prefix), test_mode=True, ))

test_dataloader = val_dataloader

val_evaluator = dict(ann_file=data_root + 'annotations/test.json') test_evaluator = val_evaluator

Environment

Package Version Editable project location


accelerate 0.18.0 addict 2.4.0 aiofiles 23.1.0 aiohttp 3.8.4 aiosignal 1.3.1 albumentations 1.3.0 altair 4.2.2 anyio 3.6.2 asttokens 2.2.1 async-timeout 4.0.2 backcall 0.2.0 certifi 2022.12.7 charset-normalizer 3.1.0 click 8.1.3 cmake 3.26.3 colorama 0.4.6 coloredlogs 15.0.1 comm 0.1.3 conda-pack 0.6.0 contourpy 1.0.6 cycler 0.11.0 debugpy 1.6.7 decorator 5.1.1 diffusers 0.15.1 executing 1.2.0 fastapi 0.95.1 ffmpy 0.3.0 filelock 3.8.2 flatbuffers 23.3.3 fonttools 4.38.0 frozenlist 1.3.3 fsspec 2023.4.0 gdown 4.6.0 gradio 3.27.0 gradio_client 0.1.3 h11 0.14.0 httpcore 0.17.0 httpx 0.24.0 huggingface-hub 0.13.4 humanfriendly 10.0 idna 3.4 imageio 2.28.0 importlib-metadata 6.4.1 importlib-resources 5.12.0 ipykernel 6.22.0 ipython 8.12.0 jedi 0.18.2 Jinja2 3.1.2 joblib 1.2.0 jsonschema 4.17.3 jupyter_client 8.2.0 jupyter_core 5.3.0 kiwisolver 1.4.4 labelImg 1.8.6 lazy_loader 0.2 linkify-it-py 2.0.0 lit 16.0.1 Markdown 3.4.3 markdown-it-py 2.2.0 MarkupSafe 2.1.2 maskrcnn-benchmark 0.0.0
matplotlib 3.6.3 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.3 mdurl 0.1.2 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 mmcv 2.0.0 mmdet 3.0.0 mmengine 0.7.2 mmpycocotools 12.0.3 mmyolo 0.5.0
model-index 0.1.11 mpmath 1.3.0 multidict 6.0.4 nest-asyncio 1.5.6 networkx 3.1 numpy 1.24.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 nvidia-pyindex 1.0.9 opencv-python 4.7.0.68 openmim 0.3.7 ordered-set 4.1.0 orjson 3.8.10 packaging 23.1 pandas 2.0.1 parso 0.8.3 pickleshare 0.7.5 Pillow 9.4.0 pip 23.0.1 pkgutil_resolve_name 1.3.10 platformdirs 3.2.0 prettytable 3.7.0 prompt-toolkit 3.0.38 psutil 5.9.5 pure-eval 0.2.2 pycocotools 2.0.6 pydantic 1.10.7 pydub 0.25.1 Pygments 2.15.1 pyparsing 3.0.9 pyrsistent 0.19.3 PySocks 1.7.1 python-dateutil 2.8.2 python-multipart 0.0.6 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0 pyzmq 25.0.2 qudida 0.0.4 regex 2023.3.23 requests 2.29.0 rich 13.3.5 safetensors 0.3.0 scikit-image 0.20.0 scikit-learn 1.2.2 scipy 1.9.1 semantic-version 2.10.0 setuptools 66.0.0 shapely 2.0.1 six 1.16.0 sniffio 1.3.0 stack-data 0.6.2 starlette 0.26.1 sympy 1.11.1 tabulate 0.9.0 termcolor 2.3.0 terminaltables 3.1.10 threadpoolctl 3.1.0 tifffile 2023.4.12 timm 0.6.12 tokenizers 0.13.3 toolz 0.12.0 torch 1.10.1 torchaudio 0.10.0+rocm4.1 torchvision 0.11.2 tornado 6.3 tqdm 4.64.1 traitlets 5.9.0 transformers 4.28.1 triton 2.0.0 typing_extensions 4.4.0 tzdata 2023.3 uc-micro-py 1.0.1 urllib3 1.26.15 uvicorn 0.21.1 wcwidth 0.2.6 websockets 11.0.2 wheel 0.38.4 yapf 0.32.0 yarl 1.8.2 zipp 3.15.0

Additional information

No response

hhaAndroid commented 1 year ago

@wang002 Can you run https://github.com/open-mmlab/mmyolo/blob/main/configs/ppyoloe/ppyoloe_plus_s_fast_1xb12-40e_cat.py properly? If so, please refer to this configuration for changes

Yuanyang-Zhu commented 1 year ago

@wang002 Can you run https://github.com/open-mmlab/mmyolo/blob/main/configs/ppyoloe/ppyoloe_plus_s_fast_1xb12-40e_cat.py properly? If so, please refer to this configuration for changes

测试了提供的ppyoloe+的cat数据集配置,单卡可以正常运行。用cat配置实现多卡RAM会爆,num_workers设置为0也不行。其他的模型多卡都正常。