prabindh / darknet

Convolutional Neural Networks
122 stars 46 forks source link

Darknet-cpp with Yolo9000 #1

Closed prabindh closed 7 years ago

prabindh commented 7 years ago

Copying from willie maddox comment in forum -!topic/darknet/4Hb159aZBbA

The cpp port to darknet is broken for detector demo on new yolo9000. I posted the following in an earlier thread.

I also am getting segfaults for ./darknet-cpp detector demo cfg/ cfg/yolo9000.cfg yolo9000.weights. I located the error and it occurs when calling free( at the bottom of image.c. (which is called in the while loop by free_image(disp) in demo.c)

I pulled the latest version of darknet and did a fresh make clean; make I verified that the defaults in point to the correct files and that all the 9k.* files are present. I also tried running ./darknet-cpp detector demo cfg/ cfg/yolo.cfg yolo.weights just for a sanity check and it ran with no errors. Also the straight c code version of darknet works fine with yolo9000. I put a breakpoint at free( and verified that the memory pointed to by was a float (usually a value between 0.5 and 0.6 depending on the run) in each of the detector demo runs listed above.

But for some reason, yolo9000 crashes when trying to free

prabindh commented 7 years ago

I tried to replicate, and below is what I see:

./darknet-cpp detector demo cfg/ cfg/yolo9000.cfg yolo9000.weights

WillieMaddox commented 7 years ago

Yeah, that is strange. The error occurs when I have GPU+CUDNN+OPENCV enabled. I have not tried it yet with GPU+CUDNN disabled. I'll test that on Monday and let you know how it goes.

prabindh commented 7 years ago

I see, possibly I might hit it after a long time with GPU enabled too. With CPU, I hit it at the same spot very early. So looking at that first. If I fix it, I think the GPU mode will also be fixed.

prabindh commented 7 years ago

This is fixed now in my trials - there is a critical bug in darknet (mainline) that is fixed in tag 3.76. Please check tag v3.76 . Once you check it at your end, please update.

prabindh commented 7 years ago

There are many fixes, but the critical of them is in line 359 : src/region_layer.c (exceeding array bounds in probs array). This is what causes the heap corruption in question. The others are less critical, I do not believe they are causing the current issue.

prabindh commented 7 years ago

prabindh commented 7 years ago

For now, I have commented out the offending line pending further investigation if that line is really needed or not,

WillieMaddox commented 7 years ago

I pulled your changes and it seems to be working. Still slow. but so is the straight C version. Thanks for taking a look.

prabindh commented 7 years ago

So this leaves us with a buffer overflow bug in the mainline code, hope Joseph reads the thread you had commented in.

prabindh commented 7 years ago

Hello WillieMaddox, if this issue does not recur, could you kindly close the issue ? Will track it further in mainline issues.

WillieMaddox commented 7 years ago

Seems to be working fine now. I dont' have the option to close the issue.

prabindh commented 7 years ago
