convergence and detection issue

hashirali2604 commented 6 years ago

I am training darkflow for my custom object detection i.e dental instruments having 11 classes right now... At first i gathered images from google using chrome extension and those images are all 225x225 pixels and then i annotate them manually.... So it was about 6,800 images and annotations.... Upon training my dataset it converges loss & moving ave loss to around 3 and no more convergence was happening... I thought changing in the learning rate will let it decrease any further but unfortunately it didn't work out... I then decided to change my dataset as I thought maybe it wasn't good enough so I made my dataset of my own instruments by making videos and breakdown it into frames so I have a dataset of around 16,000 images and annotations.... Now upon training at 1e-05 learning rate it starts from 111/114 and converge to negative values of loss and ave loss in just 2 epochs.... Then I changed the learning rate to 1e-07 and it converge at 71575 iterations with a loss and ave loss of 0.5, but upon evaluation it shows no detection so i decrease threshold (I guess maybe it will help) to 0.001 and then some bounding boxes shown up but not properly on the instruments, I changed again and again but didn't get what desire.... Anyone please help me if you know anything regarding it.... I am too new to these problems....

ashleyjsands commented 6 years ago

Hi @hashirali2604,

I personally am struggling to get darkflow to work correctly label my own dataset. But here are some ideas that I have that may or may not be helpful:

You are focusing on lowering the loss, which is a proxy signal for performance. So just because you squeeze out some extra loss improvements, it doesn't guarantee that your model is actually going to perform better. I recommend that you use mAP (https://github.com/Cartucho/mAP) to evaluate the performance of your model on a validation set (if you aren't already). I would personally go back and train your model again and every 5000 iterations, use mAP to evaluate your training performance and validation performance to get a better idea of how it's going.
I have personally try breaking up a video into its frames and training a different model (not darkflow) on the frames and it actually decreased it's performance. Now this doesn't suggest that such a technique is bad. It could mean that my model wasn't generalising at all, and the video frame images just made that more apparent. Or it may be that I misapplied the technique. So once again, I recommend that you train your model without the video frames and use mAP to evaluate the model and then train the model again using the video frames as well and evaluating it again. Now compare the results and see if it actually improves the performance of the model. Note: it is extremely important not to introduce data bias when you use video frames. Because video frames are generally very similar to frames adjacent to it within the video, you can introduce a data bias into your datasets if you spread the video frames from one video into your training, validation and test datasets. So, when splitting your datasets, you should put all frames from each video collectively together into one dataset only. I recommend that you put your video frames only into your training set, that way you won't have any bias when measuring validation or test performance.

I haven't tried any of these ideas for darkflow yet. So please let me know if they work or don't work, because it seems that we are struggling with similar issues and that we can help one another succeed with getting darkflow to perform well.

I also recommend that you close your other related issues just to keep this github repo free of abandoned issues. This will allow us to focus this conversation of improving darkflow performance in one issue, which will be beneficial for future programmers coming here with the same problem.

Regards, Ashley

hashirali2604 commented 6 years ago

@ashleyjsands thank you very much for your help and support, I will let you know once tried your ideas....

alishibli97 commented 5 years ago

@hashirali2604 did you solve your problem yet?

hashirali2604 commented 5 years ago

@alishibli97 nah the problem still resist... I don't have much time for that so I started working on SSD-MobileNet using Tensorflow Object Detection API

alishibli97 commented 5 years ago

@hashirali2604 Hey, so basically I trained more, I did more iterations (now in the 5th iteration) Ok the loss is decreasing slowly but in general it is going down It is reaching 0. something or 1. something So I think it is only about more training and iterations. In case you find another solution please share! And thanks!

thtrieu / darkflow

convergence and detection issue #911