Open epifanio opened 6 years ago
Set batch=1 subdivisions=1
https://github.com/pjreddie/darknet/blob/680d3bde1924c8ee2d1c1dea54d3e56a05ca9a26/cfg/yolov3.cfg#L6-L7
@AlexeyAB , thanks for this advice. I was getting close ... I applied the changes that you suggested and I rerun the example. This time the process works until the memory limit of my GPU is reached.
This the output from the darknet
example:
epinux@epinux:~/dev/darknet$ ./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BFLOPs
1 conv 64 3 x 3 / 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BFLOPs
2 conv 32 1 x 1 / 1 304 x 304 x 64 -> 304 x 304 x 32 0.379 BFLOPs
3 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64 3.407 BFLOPs
4 res 1 304 x 304 x 64 -> 304 x 304 x 64
5 conv 128 3 x 3 / 2 304 x 304 x 64 -> 152 x 152 x 128 3.407 BFLOPs
6 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64 0.379 BFLOPs
7 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128 3.407 BFLOPs
8 res 5 152 x 152 x 128 -> 152 x 152 x 128
9 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64 0.379 BFLOPs
10 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128 3.407 BFLOPs
11 res 8 152 x 152 x 128 -> 152 x 152 x 128
12 conv 256 3 x 3 / 2 152 x 152 x 128 -> 76 x 76 x 256 3.407 BFLOPs
13 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
14 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
15 res 12 76 x 76 x 256 -> 76 x 76 x 256
16 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
17 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
18 res 15 76 x 76 x 256 -> 76 x 76 x 256
19 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
20 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
21 res 18 76 x 76 x 256 -> 76 x 76 x 256
22 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
23 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
24 res 21 76 x 76 x 256 -> 76 x 76 x 256
25 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
26 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
27 res 24 76 x 76 x 256 -> 76 x 76 x 256
28 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
29 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
30 res 27 76 x 76 x 256 -> 76 x 76 x 256
31 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
32 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
33 res 30 76 x 76 x 256 -> 76 x 76 x 256
34 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128 0.379 BFLOPs
35 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256 3.407 BFLOPs
36 res 33 76 x 76 x 256 -> 76 x 76 x 256
37 conv 512 3 x 3 / 2 76 x 76 x 256 -> 38 x 38 x 512 3.407 BFLOPs
38 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
39 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
40 res 37 38 x 38 x 512 -> 38 x 38 x 512
41 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
42 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
43 res 40 38 x 38 x 512 -> 38 x 38 x 512
44 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
45 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
46 res 43 38 x 38 x 512 -> 38 x 38 x 512
47 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
48 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
49 res 46 38 x 38 x 512 -> 38 x 38 x 512
50 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
51 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
52 res 49 38 x 38 x 512 -> 38 x 38 x 512
53 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
54 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
55 res 52 38 x 38 x 512 -> 38 x 38 x 512
56 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
57 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
58 res 55 38 x 38 x 512 -> 38 x 38 x 512
59 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
60 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
61 res 58 38 x 38 x 512 -> 38 x 38 x 512
62 conv 1024 3 x 3 / 2 38 x 38 x 512 -> 19 x 19 x1024 3.407 BFLOPs
63 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
64 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
65 res 62 19 x 19 x1024 -> 19 x 19 x1024
66 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
67 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
68 res 65 19 x 19 x1024 -> 19 x 19 x1024
69 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
70 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
71 res 68 19 x 19 x1024 -> 19 x 19 x1024
72 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
73 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
74 res 71 19 x 19 x1024 -> 19 x 19 x1024
75 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
76 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
77 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
78 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
79 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512 0.379 BFLOPs
80 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024 3.407 BFLOPs
81 conv 255 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 255 0.189 BFLOPs
82 yolo
83 route 79
84 conv 256 1 x 1 / 1 19 x 19 x 512 -> 19 x 19 x 256 0.095 BFLOPs
85 upsample 2x 19 x 19 x 256 -> 38 x 38 x 256
86 route 85 61
87 conv 256 1 x 1 / 1 38 x 38 x 768 -> 38 x 38 x 256 0.568 BFLOPs
88 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
89 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
90 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
91 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256 0.379 BFLOPs
92 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512 3.407 BFLOPs
93 conv 255 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 255 0.377 BFLOPs
94 yolo
95 route 91
96 conv 128 1 x 1 / 1 38 x 38 x 256 -> 38 x 38 x 128 0.095 BFLOPs
97 upsample 2x 38 x 38 x 128 -> 76 x 76 x 128
98 route 97 36
99 conv 128 1 x 1 / 1 76 x 76 x 384 -> 76 x 76 x 128 0.568 BFLOPs
100 CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)
nvidia-smi
i got the following output, just before the out-of-memory error:epinux@epinux:~/Downloads$ nvidia-smi
Tue Sep 11 22:07:07 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37 Driver Version: 396.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 950 Off | 00000000:04:00.0 Off | N/A |
| 0% 61C P0 24W / 75W | 1917MiB / 2002MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 5898 C ./darknet 1906MiB |
+-----------------------------------------------------------------------------+
Anything else that I can try (before buying a newer GPU -:)
@epifanio
Anything else that I can try (before buying a newer GPU -:)
Pre-order new GPU )
You can try to set width=416 height=416
(or any values multiple of 32) in your cfg-file.
That did the trick! Thanks!!!
I also got it running on a TX2 (which needed only the batch=1
and subdivisions=1
changes. seems the TX2 is performing better than the GTX950! I also tried on a laptop with a quadro p620, whic also required to tune width
and height
.
This is for a research at school, they provided a cheap but "newer card" GTX 1050ti
(4 Gb ram) it completed the job but still I had to tune the batch
and subdivisions
parameters, the only combination that worked was the one you suggested (1,1).
@AlexeyAB can you please point me on where to find explanation/docs for the all the parameters used in the cfg
file? especially the batch
and subdivision
ones.
This is for a research at school, they provided a cheap but "newer card" GTX 1050ti (4 Gb ram) it completed the job but still I had to tune the batch and subdivisions parameters, the only combination that worked was the one you suggested (1,1).
If you use this GitHub repository, then use batch=64 subdivisions=64, it's optimal for both: training and detection.
can you please point me on where to find explanation/docs for the all the parameters used in the cfg file? especially the batch and subdivision ones.
Nowhere ) May be I will write it.
Alexey, I'd be happy to help write it if you fill in the gaps and correct my mistakes. Email me @gmail.com
@AlexeyAB @epifanio Thanks.. saved my day. I need higher GPU.
I'm trying to run several example on my old workstation, specs:
CPU
GPU
Every example that I tried (nightmare excluded) ended up with the following error:
The same example works fine using only the
CPU
by adding the-nogpu
flag:I'm on Ubuntu 16.04 with:
I've installed OpenCV-3.4.0
and the folllowinf is my
darknet Makefile
:Have you any clue if this is a software or hardware issue? I may have some sw libraries conflicts or perhaps the hardware is too obsolete?