Is the Ultralytics YOLO V4 available?

adrianosantospb commented 4 years ago

Hello, guys.

I'd like to know if the Ultralytics YOLO V4 version is available?

Tks!

glenn-jocher commented 4 years ago

@adrianosantospb yes, you can use this to train yolov4, though to be honest the performance is quite similar to existing yolov3-spp, and the memory consumption is about 3X higher. There are a few 'bag of specials' attributes that are not implemented in this repo, but they have a minor effect.

python train.py --cfg yolov4.cfg --weights ''

adrianosantospb commented 4 years ago

Great news! I will test it right now.

adrianosantospb commented 4 years ago

So, I have done the download of the newest version. I have created a new Env and I have done the installation of the all required libraries. I have done all the configuration on my server, but when I have tried to start a new training I have got this error:

assert torch. cuda. is_available (), 'CUDA unavailable, invalid device %s requested' % device # check availability AssertionError: CUDA unavailable, invalid device 0,1 requested

The question is: If I try to use the last version, I got to start training and the GPUs are recognized.

glenn-jocher commented 4 years ago

@adrianosantospb this just means you requested invalid cuda devices. If you have one gpu you can do --device 0 for example, or to run on cpu --device cpu.

adrianosantospb commented 4 years ago

I have GPUs (2 on this machine). But, using the newest installation, I got the output to torch.cuda.is_available() is False. I have kept the previous version on the same server and I'm using different ENV. And when I start a test using the previous version works.

glenn-jocher commented 4 years ago

@adrianosantospb you just have environment problems unrelated to this repo. You need to make sure that pytorch can find your gpu. I'll link you to some of our working environments here in case you'd like to use one.

Reproduce Our Environment

To access an up-to-date working environment (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled), consider a:

GCP Deep Learning VM with $300 free credit offer: GCP Quickstart Guide
Google Colab Notebook with 12 hours of free GPU time: Google Colab Notebook
Docker Image from https://hub.docker.com/r/ultralytics/yolov3. See Docker Quickstart Guide

adrianosantospb commented 4 years ago

Just to you know. I have solved the problem. I have the CUDA 10.1 and, there is one incompatibility between the PyTorch version and the CUDA. So, I have solved doing this instalation:

pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

glenn-jocher commented 4 years ago

ah, great!

feixiangdekaka commented 4 years ago

Just to you know. I have solved the problem. I have the CUDA 10.1 and, there is one incompatibility between the PyTorch version and the CUDA. So, I have solved doing this instalation:

pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Waiting for your results.

adrianosantospb commented 4 years ago

@adrianosantospb yes, you can use this to train yolov4, though to be honest, the performance is quite similar to existing yolov3-spp, and the memory consumption is about 3X higher. There are a few 'bag of specials' attributes that are not implemented in this repo, but they have a minor effect.

python train.py --cfg yolov4.cfg --weights ''

Hey, @glenn-jocher , do you have an idea what is the minimum of RAM necessary to train a model using the YOLOV4? I have seen the same behavior that you have spoken about the memory consumption to be about 3X higher; I have seen the same! I have tried to do the test using one server with 64GB and 2 GPUs RTX 2070 and it didn't work. Now, I'm using another one, but it just was possible to run using a batch size of 8 and with only one GPU.

When I try to use 2 GPU, I get this error message:

Traceback (most recent call last): File "train.py", line 412, in <module> train() # train normally File "train.py", line 174, in train model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True) File "/home/monster/miniconda3/envs/ultralyticsv4/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 287, in __init__ self._ddp_init_helper() File "/home/monster/miniconda3/envs/ultralyticsv4/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 380, in _ddp_init_helper expect_sparse_gradient) RuntimeError: Model replicas must have an equal number of parameters.

Both GPU are RTX 2070.

For me, it seems a Pytorch bug... But I'm not sure yet.

adrianosantospb commented 4 years ago

Well... Using the follow command the GPU problem is solved:

pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

I could run using 2 GPU. I think that PyTorch 1.5 has some bug.

adrianosantospb commented 4 years ago

Hey, @glenn-jocher , I have seen that you are using the single-process multi-device (SPMD) mode of DDP instead DataParallel. Is there some reason? The SPMD will be removed in future releases according to the Pytorch team.

model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True)

There is a bug in PyTorch V 1.5 in the DistributedDataParallel approach. It has solved in the previous version, but it wasn't replicated to V 1.5.0.

I think you should use: model = nn.DataParallel(model)

glenn-jocher commented 4 years ago

@adrianosantospb if you want to trian yolov4, I'd highly recommend using yolov4-relu.cfg, which is simply yolov4 with mish activations replaced by relu. It trains at similar speeds to yolov3 with similar memory requirements.

About your distributed training question, we replaced nn.DataParallel(model) with the current implementation in train.py, but left it in test.py. This improved multigpu speed substantially. If it is being deprecated we may need to switch back. Could you try testing the switch and report comparison training speeds before and after?

adrianosantospb commented 4 years ago

@glenn-jocher, I'm using the relu version. And about your question, for sure. I'll be glad to help you.

glenn-jocher commented 4 years ago

@adrianosantospb awesome! Maybe we've come full circle on multi-gpu here, because we made the switch to torch.nn.parallel.DistributedDataParallel() about a year ago, because back then it performed much better than torch.nn.DataParallel(). I'm still a bit confused about the differences, it could be the pytorch team is trying to simplify things by merging the two now, that would be nice. We do most of our trianing on single-gpu, so any updates from multi-gpu experts are welcome!

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

justsolo-smith commented 4 years ago

@adrianosantospb如果您想尝试yolov4，我强烈建议您使用yolov4-relu.cfg，它只是yolov4，其中的mish激活被relu取代。它以与内存相似的yolov3类似的速度训练。

关于您的分布式培训问题，我们用train.py中的当前实现替换了nn.DataParallel（model），但将其保留在test.py中。这大大提高了multigpu速度。如果不推荐使用，我们可能需要切换回去。您能否尝试测试该开关并在前后报告比较训练速度？ Can you provide the speed and precision comparison result between the Yolov4-Relu version and the Yolov4 version? Is there a big difference?

justsolo-smith commented 4 years ago

@ glenn-jocher，我正在使用relu版本。当然，关于您的问题。我很乐意为您提供帮助。

Can you provide the speed and precision comparison result between the Yolov4-Relu version and the Yolov4 version? Is there a big difference?

glenn-jocher commented 4 years ago

Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.

June 22, 2020: PANet updates: new heads, reduced parameters, faster inference and improved mAP 364fcfd.
June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
June 9, 2020: CSP updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
May 27, 2020: Public release of repo. YOLOv5 models are SOTA among all known YOLO implementations.
April 1, 2020: Start development of future YOLOv3/YOLOv4-based PyTorch models in a range of compound-scaled sizes.

Pretrained Checkpoints

Model	AP^val	AP^test	AP₅₀	Speed_GPU	FPS_GPU		params
YOLOv5s	36.6	36.6	55.8	2.1ms	476	7.5M	13.2B
YOLOv5m	43.4	43.4	62.4	3.0ms	333	21.8M	39.4B
YOLOv5l	46.6	46.7	65.4	3.9ms	256	47.8M	88.1B
YOLOv5x	48.4	48.4	66.9	6.1ms	164	89.0M	166.4B
YOLOv3-SPP	45.6	45.5	65.2	4.5ms	222	63.0M	118.0B

For more information and to get started with YOLOv5 please visit https://github.com/ultralytics/yolov5. Thank you!

ultralytics / yolov3

Is the Ultralytics YOLO V4 available? #1137

Reproduce Our Environment

Pretrained Checkpoints