thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.09k forks source link

Can't import weights and cfg file from Darknet #325

Open zenineasa opened 7 years ago

zenineasa commented 7 years ago

I had used Darknet to train earlier. Trying to use the same .cfg and .weights file in order to detect doesn't work, I guess. Getting the following error: AssertionError: expect 268263452 bytes, fount 268263456

Anything that I might be doing wrong?

jubjamie commented 7 years ago

Those byte counts are suspiciously close. But nonetheless, can you confirm what happens when you downloaded the cfg and weights from here: https://pjreddie.com/darknet/yolo/

zenineasa commented 7 years ago

Actually, I was using darknet for the past few days, following pjereddie's website. Had collected a few images, marked them and trained them on darknet. It was working pretty well on that.

I just wanted to test darkflow out, so I imported the trained weights and the cfg file that I had created for darknet to darkflow (copy and paste); tried running, got this error.

Aren't the cfg and the weights configuration in darknet and darkflow mutually compatible?

jubjamie commented 7 years ago

Yeah I think they are. Not sure about once they are trained if you've trained a new model but they should be. The fact that your byte count is off by 4 seems suspicious like something didn't quite save right or there's something else not quite right.

Is your new model that you're trying to load based off of a yolo cfg or is it a brand new one?

zenineasa commented 7 years ago

Based on darknet19_448.conv.23

jubjamie commented 7 years ago

Not familiar enough with darknet sorry. I presume you have no issues using one of the yolo cfgs/weights from the website?

zenineasa commented 7 years ago

Well, when I tried replacing the yolo.cfg and yolo.weights from darknet to darkflow, it worked fine. But as I renamed yolo.cfg and yolo.weights to yolo1.cfg and yolo1.weights respectively, and tried to run those, I got another AssertionError...

AssertionError: labels.txt and cfg/yolo1.cfg indicate inconsistent class numbers.

I know that there are 80 classes in yolo and it requires to have 80 labels. So, I added a few dummy contents so that it has 80 classes, at then it worked fine. Are there somethings hardcoded for yolo.cfg? Where should I look for 'em?

jubjamie commented 7 years ago

Not really sure. Have you adjusted the configs as suggested here. It requires you to specify the classes. This could cause an issue when trying to train?

zenineasa commented 7 years ago

Yes, its exactly the same as in Darknet.

Kowasaki commented 6 years ago

I have the exact same off by 4 bytes error using darknet19_448.conv.23 to train in darknet before porting to darkflow! Did anyone ever figure out what the problem might be?

Benjamin-Vencill commented 6 years ago

I'm also encountering this error! I trained a brand new model in darknet yesterday (using the pre-trained darknet19_448.conv.23) and tried to load it in darkflow using the output .weights file from darknet and I'm off by 4 bytes as well! I'm working with a two-class model so my config looks like:

[convolutional]
filters=35

[region]
classes=2

as per the recommendation. This yields:

AssertionError: expect 202335260 bytes, found 202335264

I've tried several iterations of adjusting the configs (changing class and filter numbers for the last layer) to no avail. I suspected that the "off by 4 bytes" is due to a dimensional mismatch on the last layer, something like darkflow is expecting 3 classes and getting only 2 in the output layer. So I tried modifying my .cfg file like so:

[convolutional]
filters=40

[region]
classes=3

and this yields an over-read: AssertionError: Over-read ../darknet/new_obj.weights

I would love any insight into this problem! Thanks!

minhnhat93 commented 6 years ago

Exactly the same issue as @Benjamin-Vencill when using the trained yolo.weights and yolo.cfg on darknet website as well as after finetuning using darknet. Off by 4 bytes. Anyone have any idea how the weights in darknet and darkflow are saved/loaded?

minhnhat93 commented 6 years ago

Update: I don't know what is happening but after printing the index of layers that was loaded I found out that all the weights in the newly trained model from darknet was shifted right by 4 bytes compared to the older model in darkflow. Changing this line: https://github.com/thtrieu/darkflow/blob/479c83e14559fd5eceb9a9f612503b29a67fac5c/darkflow/utils/loader.py#L121 to self.offset = 20 helped me solved my problem and I was able to use my newly trained model in darknet with darkflow. It pretty weird though because on the darknet website they still only have 16 bytes for extra stuff at the beginning: https://github.com/pjreddie/darknet/blob/d8c5cfd6c6c7dca460c64521358a0d772e5e8d52/src/parser.c#L906 Can someone who is expert at this shed some light on this behaviour?

zinkcious commented 6 years ago

Exactly the same problem as above. Solved using @minhnhat93 's answer above. I still wanted to know why I have change the offset to 20 when importing the darknet model I trained using my own dataset. Because, if import the official cfg "tiny-yolo-voc.cfg" and official weight "tiny-yolo-voc.weights" to darkflow, the offset 16 workd fine.

I think it might be the bug in darknet

Thanks a lot! @minhnhat93

Update: I don't know what is happening but after printing the index of layers that was loaded I found out that all the weights in the newly trained model from darknet was shifted right by 4 bytes compared to the older model in darkflow. Changing this line: https://github.com/thtrieu/darkflow/blob/479c83e14559fd5eceb9a9f612503b29a67fac5c/darkflow/utils/loader.py#L121 to self.offset = 20 helped me solved my problem and I was able to use my newly trained model in darknet with darkflow. It pretty weird though because on the darknet website they still only have 16 bytes for extra stuff at the beginning: https://github.com/pjreddie/darknet/blob/d8c5cfd6c6c7dca460c64521358a0d772e5e8d52/src/parser.c#L906 Can someone who is expert at this shed some light on this behaviour?

zenineasa commented 6 years ago

Actually the offset must be equivalent to the size of 4 floating point variables, i.e. 24 bytes. These are using for version number and a few other stuffs in the weights file. I had read that elsewhere, can't remember where exactly it was.

Benjamin-Vencill commented 6 years ago

I used the solution @minhnhat93 provided and it works now! Nice work, thanks!

zinkcious commented 6 years ago

I find that although there report no error importing weights from darknet using the method supplied by @minhnhat93 (change offset to 20), the detecting result is a little different from the result in darknet, as shown in the pictures below, The left pic is the result of darknet and the right pic is the result of darkflow (both using the same weights and cfg) https://github.com/zinkcious/machine-learning-Udacity/blob/master/65_cmp.jpg https://github.com/zinkcious/machine-learning-Udacity/blob/master/01_cmp.png

Any one knows the problem importing weights from darknet?

zenineasa commented 6 years ago

@zinkcious That could be because of the detection threshold, couldn't that?

zinkcious commented 6 years ago

I don't think so, is the thresh defined in the last but one line in the cfg file? Both of the two case, it writes: thresh = .6 @zenineasa as below: https://github.com/zinkcious/machine-learning-Udacity/blob/master/cmp_code.png

zinkcious commented 6 years ago

Do you have similar problem like me? @zenineasa @minhnhat93 @Benjamin-Vencill

zenineasa commented 6 years ago

I kind of moved on with writing one my own using Keras. There had been a few repositories in Github where developers have tried to do the same.

zinkcious commented 6 years ago

I have new findings, I try train tiny-yolo-voc.cfg using the VOC dataset and get "tiny-yolo-voc_100.weights" whose file size is 63471560 bytes. And when I look at the file size of tiny-yolo-voc.weights download from the official website, it's file size is 63471556 bytes, with 6 bytes different from the weights I trained. I don't understand why is it.

minhnhat93 commented 6 years ago

@zinkcious Yeah, I just checked. I'm having some kind of problem like that too. The objectness score of the detection using darkflow and darknet after the fix are different... Even weirder, now I can load .weights file from darknet but not .backup file from darknet even though the two file formats are the same...

zinkcious commented 6 years ago

Any one who is familiar with darknet and yolo's source code can answer this question?...

imaami commented 6 years ago

Yeah, I predicted these things would happen when I noticed darknet uses sizeof() to calculate its binary file format layout. The exact change that caused this is here:

https://github.com/pjreddie/darknet/commit/1467621453e1c6932841a4992e6dffe0d0d8de24#diff-bfbbcdf73459e9ea8fb4afa8455ce74dL909

There's an issue about this bug, but unfortunately there's no fix yet (I've been meaning to write a patch, been busy with other things): https://github.com/pjreddie/darknet/issues/78

saltedfishpan commented 5 years ago

@minhnhat93 It worked!!!thank you very much!!! Genius!!