prabindh / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
122 stars 46 forks source link

Darknet-cpp with Yolo9000 #1

Closed prabindh closed 7 years ago

prabindh commented 7 years ago

Copying from willie maddox comment in forum - https://groups.google.com/forum/#!topic/darknet/4Hb159aZBbA

The cpp port to darknet is broken for detector demo on new yolo9000. I posted the following in an earlier thread.

I also am getting segfaults for ./darknet-cpp detector demo cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights. I located the error and it occurs when calling free(m.data) at the bottom of image.c. (which is called in the while loop by free_image(disp) in demo.c)

I pulled the latest version of darknet and did a fresh make clean; make I verified that the defaults in combine9k.data point to the correct files and that all the 9k.* files are present. I also tried running ./darknet-cpp detector demo cfg/coco.data cfg/yolo.cfg yolo.weights just for a sanity check and it ran with no errors. Also the straight c code version of darknet works fine with yolo9000. I put a breakpoint at free(m.data) and verified that the memory pointed to by m.data was a float (usually a value between 0.5 and 0.6 depending on the run) in each of the detector demo runs listed above.

But for some reason, yolo9000 crashes when trying to free m.data.

prabindh commented 7 years ago

I tried to replicate, and below is what I see:

./darknet-cpp detector demo cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights

WillieMaddox commented 7 years ago

Yeah, that is strange. The error occurs when I have GPU+CUDNN+OPENCV enabled. I have not tried it yet with GPU+CUDNN disabled. I'll test that on Monday and let you know how it goes.

prabindh commented 7 years ago

I see, possibly I might hit it after a long time with GPU enabled too. With CPU, I hit it at the same spot very early. So looking at that first. If I fix it, I think the GPU mode will also be fixed.

prabindh commented 7 years ago

This is fixed now in my trials - there is a critical bug in darknet (mainline) that is fixed in tag 3.76. Please check tag v3.76 https://github.com/prabindh/darknet/releases/tag/v3.76 . Once you check it at your end, please update.

prabindh commented 7 years ago

There are many fixes, but the critical of them is in line 359 : src/region_layer.c (exceeding array bounds in probs array). This is what causes the heap corruption in question. The others are less critical, I do not believe they are causing the current issue.

prabindh commented 7 years ago

https://github.com/prabindh/darknet/commit/ce90edebe35aef462f62bbb0098836480e67bf67

prabindh commented 7 years ago

For now, I have commented out the offending line pending further investigation if that line is really needed or not,

WillieMaddox commented 7 years ago

I pulled your changes and it seems to be working. Still slow. but so is the straight C version. Thanks for taking a look.

prabindh commented 7 years ago

So this leaves us with a buffer overflow bug in the mainline code, hope Joseph reads the thread you had commented in.

prabindh commented 7 years ago

Hello WillieMaddox, if this issue does not recur, could you kindly close the issue ? Will track it further in mainline issues.

WillieMaddox commented 7 years ago

Seems to be working fine now. I dont' have the option to close the issue.

prabindh commented 7 years ago

Closing.