Open blue-q opened 1 week ago
After I pip install ultralytics
, if I allow the training command, I will keep getting stuck in
Transferred 469/475 items from pretrained weights
DDP: debug command /home/qiuzx/miniconda3/envs/yolov8/bin/python -m torch.distributed.run --nproc_per_node 4 --master_port 48025 /home/qiuzx/.config/Ultralytics/DDP/_temp_71rgtm97139853097381840.py
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Ultralytics YOLOv8.2.9 π Python-3.8.13 torch-1.12.1+cu113 CUDA:0 (NVIDIA GeForce RTX 4090, 24217MiB)
CUDA:1 (NVIDIA GeForce RTX 4090, 24217MiB)
CUDA:2 (NVIDIA GeForce RTX 4090, 24217MiB)
CUDA:3 (NVIDIA GeForce RTX 4090, 24217MiB)
WARNING β οΈ Upgrade to torch>=2.0.0 for deterministic training.
Overriding model.yaml nc=80 with nc=2
Transferred 469/475 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed β
and not go down. In addition, to make it easier for me to modify the code, I pip uninstall ultralytics
and keep reporting errors during multi GPUs training,
File "/home/qiuzx/. config/Ultralytics/DDP/_temp_18uxnvwv139801431181536. py", line 6, in<module>
From ultralytics. models. yolo. detect. train import DetectionTrainer
ModuleNotFoundError: No module named 'ultralytics'.
How to solve these two problems
@blue-q hello! It seems like you're encountering two separate issues here.
Training Getting Stuck at AMP Checks: If your training consistently stops at the "AMP: checks passed β " without proceeding, this could potentially be due to insufficient resources or a configuration oversight. First, ensure that there are no resource limitations or I/O bottlenecks. Also, check if updating to Torch>=2.0.0 as suggested by the warning improves the situation, as newer versions of Torch have better support and optimizations for multi-GPU setups.
Errors After Uninstalling Ultralytics Library: When you uninstall the Ultralytics package, Python is unable to find the module because it no longer exists in your environment, leading to a ModuleNotFoundError
. If you need to make code modifications frequently, consider working in a development environment where you clone the GitHub repository and run your modified code directly from source. This approach avoids the need to uninstall and reinstall the package. You can set up this environment by cloning the repo and using pip install -e .
within the repository directory.
For both issues, ensuring that all dependencies are correctly installed and updating to the latest versions where possible often helps. If the problem persists, providing more specific logs or error messages could help in diagnosing the issue further!
Hi @glenn-jocher οΌmy CUDA is now 12.1, and I have reinstalled the torch for 2.1.2. My environment information is as follows:
Package Version Editable project location
------------------------ -------------------- -------------------------
certifi 2024.2.2
charset-normalizer 3.3.2
cmake 3.29.2
contourpy 1.1.1
cycler 0.12.1
filelock 3.14.0
fonttools 4.51.0
fsspec 2024.3.1
idna 3.7
importlib_resources 6.4.0
Jinja2 3.1.4
kiwisolver 1.4.5
lit 18.1.4
MarkupSafe 2.1.5
matplotlib 3.7.5
mpmath 1.3.0
networkx 3.1
numpy 1.24.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
opencv-python 4.9.0.80
packaging 24.0
pandas 2.0.3
pillow 10.3.0
pip 23.3.1
psutil 5.9.8
py-cpuinfo 9.0.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
requests 2.31.0
scipy 1.10.1
seaborn 0.13.2
setuptools 68.2.2
six 1.16.0
sympy 1.12
thop 0.1.1.post2209072238
torch 2.1.2
torchaudio 2.1.2
torchvision 0.16.2
tqdm 4.66.4
triton 2.1.0
typing_extensions 4.11.0
tzdata 2024.1
ultralytics 8.1.44 /home/qiuzx/ultralytics
urllib3 2.2.1
wheel 0.43.0
zipp 3.18.1
My training command is model.train(data='/home/qiuzx/ultralytics/ultralytics/cfg/datasets/20240506_flame_smoke_class2.yaml', epochs=500, imgsz=640, batch=128, device=[0,1,2,3])
, and at this point he will still get stuck in
DDP: debug command /home/qiuzx/miniconda3/envs/yolov8/bin/python -m torch.distributed.run --nproc_per_node 4 --master_port 37947 /home/qiuzx/.config/Ultralytics/DDP/_temp_mak_nap2139734965595248.py
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Ultralytics YOLOv8.1.44 π Python-3.8.13 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce RTX 4090, 24217MiB)
CUDA:1 (NVIDIA GeForce RTX 4090, 24217MiB)
CUDA:2 (NVIDIA GeForce RTX 4090, 24217MiB)
CUDA:3 (NVIDIA GeForce RTX 4090, 24217MiB)
Overriding model.yaml nc=80 with nc=2
Transferred 469/475 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed β
At the same time, I checked the status of the graphics card through nvidia-smi
and found that the usage rate of all four GPUs was 100%
Is it possible that it is caused by torch.backups.cudnn.benchmark
? After setting torch.backups.cudnn.enabled=False
, I can run with two GPUs, but if I use four GPUs, it will still get stuck in
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed β
Hello! It sounds like you might be encountering an issue related to the CUDA cuDNN benchmarks when using multiple GPUs.
Disabling torch.backends.cudnn.benchmark
can indeed help in some cases as it turns off certain optimizations that, although generally improve performance, can cause stalemates in specific situations, especially with a variable workload between different batches.
As you noticed, setting:
torch.backends.cudnn.enabled = False
helps when using two GPUs but doesn't solve the issue with four GPUs.
It could be beneficial to ensure all GPUs synchronize properly. You might want to try setting:
torch.backends.cudnn.benchmark = False
torch.cuda.synchronize()
before your training loop or right after the AMP check, to ensure all devices are in sync.
If the issue persists, please provide more details about your specific setup or configurations that might be contributing to this behavior! Happy coding! π
DDP: debug command /home/qiuzx/miniconda3/envs/yolov8/bin/python -m torch.distributed.run --nproc_per_node 4 --master_port 37947 /home/qiuzx/.config/Ultralytics/DDP/_temp_mak_nap2139734965595248.py WARNING:main:
Setting OMP_NUM_THREADS environment
θΏδΈͺη©ζε°εΊζ沑ζε½±εε
@TomZhongJie hello! It looks like you're inquiring about the impact of the OMP_NUM_THREADS
environment setting during your DDP (Distributed Data Parallel) training with YOLOv8. Setting OMP_NUM_THREADS=1
is generally recommended for avoiding potential issues with overly aggressive thread usage by PyTorch, which can lead to inefficient CPU usage in multi-threading environments, especially when using multiple GPUs. It can help to stabilize your training process by ensuring that parallel execution doesn't become a bottleneck.
If you're experiencing particular issues or slowdowns, you might consider adjusting this setting to better fit your hardware capabilities, balancing between CPU threads and GPU workload. Here's how you can experiment with it:
import os
os.environ['OMP_NUM_THREADS'] = '4' # Adjust this as necessary for your machine
Add this to your script before importing any major libraries like PyTorch or starting the training process to see if it impacts performance. Happy experimenting! π
Search before asking
Question
This is my environmental information