Closed Ownmarc closed 5 years ago
Git clone a clean copy of the repo and run one of the custom tutorials. If your results match ours then its your data.
https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
python3 train.py --data data/coco_10img.data
You should see this. The 10 image example only takes about 5 minutes on a GCP VM V100 instance.
from utils import utils; utils.plot_results()
Guess I am overfitting, labels are fine when I open them with the open source project labelimg
I had 21 classes, 450 images and ~50 objects per image
Before you do any training, an obvious first step is to run a tutorial and make sure your results match.
If you are overfitting your mAP on the train group should be great, right? Have you checked that at least?
In any case, 99% of the times people can't get results they didn't format their data correctly or they've modified the default repository.
Been trying to run it, but I can't get all the files right in windows, can't run the .sh file to set everything up.
I double checked my annotation data and everything is just like the yolo annotations. I tought my problem was the learning rate or the augmentation and did try several things to make it work tonight without any luck.
Would you mind trying my data to see if you can get something out of it ? Would be appreciated and maybe you'll be able to add guidance to the custom tutorial.
After searching the official Darknet repo, I think this may have something to do with the anchors.. I probably need to change them for my custom data
If your target sizes are different enough than the default anchors then yes you will want to vary the anchor dimensions. We used kmeans to do this with the xView data: https://github.com/ultralytics/xview-yolov3
You can run under linux using your GCP quickstart: https://docs.ultralytics.com/yolov5/environments/google_cloud_quickstart_tutorial/
Also, to make sure your targets are in the right format, you can plot the training data by using the plotting script in train.py.
About your anchors, I'd be very surprised if the smallest or largest anchors weren't covering part of your training data. They span from 10 to 370 pixels wide in a 416 size image. Changing anchors is done to improve results, not to bring them the mAP from zero to something else. I still think there must be a problem elsewhere.
@Ownmarc good news maybe. I was posting a comment on a different issue when I realized we had inadvertantly introduced a bug in the master branch related to wh loss computation. This was fixed in our test branch but not the master. I fixed this and also hardcoded plotting of the first train and test batches. When you train normally now, two files will appear in your yolov3/
directory, train_batch0.jpg and test_batch0.jpg.
You should git pull
to incorporate the wh bug fix, and retrain, viewing the two images to make sure the boxes seem correctly aligned. I will add this tidbit to the tutorials as well, this should go far in helping ppl make sure their training and testing data is well formatted.
Just checked your commit, it makes alot of sense since my training was getting worst the more I trained and it looked like a the loss on yolo layers wasn’t conputing correctly. I’ll keep you updated!
train batch, everything looks normal:
Yep! Thanks alot, mAP is showing and increasing! I think we can close this
Hmm, it must have been that wh
bug. Phew, we have to be careful here when we adjust the code. Ok, glad to hear its all working now!! I hope other people aren't running into the same problem. Probably leave open for a few days just in case anyone goes searching.
If anyone has training problems on custom data, please git pull
the latest commit and try again, as a bug was present around the first week of April that has now been resolved!
@Ownmarc hey wait a second, your screenshot is showing Recall > 1 for several categories, which is a statistical impossibility. The high recall seems to be feeding to the mAP as well, causing it to increase above 1 for the same categories.
We validated our mAP against pycocotools and darknet very well, and now it matches to 1%. I just recomputed for another issue: https://github.com/ultralytics/yolov3/issues/199#issuecomment-481216891
Do you know what might be causing this?
It seems to take into account 1 object that is predicted with 2 bboxes almost on top of each other as 2 good predictions when there is, in fact, only 1 object!
This is at 0.7 conf threshold, see this cannon having 2 bbox. They are probably counted as 2 good detections.
Here we can see the cannon class at 1.01:
Hmm. This is surely the finest test.py result I've ever seen.
It's pretty common to get two boxes for one object, that should just give you a P of 0.5 and an R of 1.0 for that instance.
Somehow your list of TPs is greater than the list of target objects, which should not be possible. In any case, it looks like the issue mellowed out eventually. I scanned the test.py code but didn't see anything out of the ordinary. Since this doesn't occur on COCO data I'll just forget about it for now.
@glenn-jocher, let me know if you want my dataset to test it!
@Ownmarc maybe if you put it all in a Google Drive folder I can check it out when I have more free time! It would certainly be interesting to see what's causing the > 1 recalls.
Do you think you could have duplicate rows in your labels file? Is it still there at the default test parameters, i.e.nms_thres 0.5
?
Yes, didn't change anything from master repo other then the init.py I need in the util folder (for windows), the font size of the plotting and setting visible gpu to the train script.
No duplicates, they were all made using a script from xml files which were hand annotated and checked using other scripts to make sure there was nothing impossible (like 8 gold_mines since a player can only have 7 maximum)
I have been training using darkflow and the exact same dataset and this was not hapenning. Maybe this can help you (from Darkflow repo):
import numpy as np
class BoundBox:
def __init__(self, classes):
self.x, self.y = float(), float()
self.w, self.h = float(), float()
self.c = float()
self.class_num = classes
self.probs = np.zeros((classes,))
def overlap(x1,w1,x2,w2):
l1 = x1 - w1 / 2.;
l2 = x2 - w2 / 2.;
left = max(l1, l2)
r1 = x1 + w1 / 2.;
r2 = x2 + w2 / 2.;
right = min(r1, r2)
return right - left;
def box_intersection(a, b):
w = overlap(a.x, a.w, b.x, b.w);
h = overlap(a.y, a.h, b.y, b.h);
if w < 0 or h < 0: return 0;
area = w * h;
return area;
def box_union(a, b):
i = box_intersection(a, b);
u = a.w * a.h + b.w * b.h - i;
return u;
def box_iou(a, b):
return box_intersection(a, b) / box_union(a, b);
def prob_compare(box):
return box.probs[box.class_num]
def prob_compare2(boxa, boxb):
if (boxa.pi < boxb.pi):
return 1
elif(boxa.pi == boxb.pi):
return 0
else:
return -1
Test loss seems to be good, here is the result.txt if it can help (21 classes) :
I was running into the same issue, but after pulling the repo again this morning, I still couldn't get mAP to not be zero. But even weirder is that my wh becomes inf after a while.
I am training with transfer learning on a custom dataset with 1 class.
@vivian-wong see https://github.com/ultralytics/yolov3/issues/168 to control divergent width-height (wh
) losses.
Working now! Thank you!
Working now! Thank you!
cong!
@glenn-jocher, was the mAP over 1.0 fixed or should we open a new issue ?
@Ownmarc > 1 recall is likely still an open issue as I have not worked on it due to an inability to rapidly reproduce it. Another user mentioned it as well. The darkflow iou code is nice to see, but their code only operates on one box at a time, whereas ours is vectorized for speed (computes many ious simultaneously). In any case, I don't think IOU is the problem.
If you raise a new issue specifically about the > 1 recall, make sure you supply all the elements to reproduce the issue, i.e. a google drive folder with the trained model, the .data and .cfg files, and the *.txt file pointing to the training images and labels folders, and of course the folders themselves. This would be the most useful.
Good morning everyone,
hi @Ownmarc can you share me your weight, cfg and .data .names from your "clash of clans" detector. I realy wanna test it, maybe make a youtube video.
Best regards, Antoine
@Ownmarc mAP > 1.0 has been fixed now.
@glenn-jocher cool, what was the issue ? Thanks
@Ownmarc just a bug in the test.py mAP calculation. Non-exclusive target-anchor combinations were being allowed which caused some targets to count as multiple TPs.
can anyone help me, i performed transfer learning on yolov3 and its detecting objects but not at the right place. cant upload the result because using company's network.
@JalajK hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:
git clone
version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov3 # remove existing
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # clone latest
python3 detect.py # verify detection
python3 train.py # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE
train_batch0.jpg
and test_batch0.jpg
for a sanity check of training and testing data.If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!
Git clone a clean copy of the repo and run one of the custom tutorials. If your results match ours then its your data.
https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
python3 train.py --data data/coco_10img.data
You should see this. The 10 image example only takes about 5 minutes on a GCP VM V100 instance.
from utils import utils; utils.plot_results()
in the log.txt output what's the information about mAP?
@Leprechault from utils import utils; utils.plot_results()
to plot your mAP.
Thanks, @glenn-jocher !!! Works, but I have a conceptual question too about the way that mAP is calculated. Because in the output file (log.txt
) after each iteration a have e.g. "1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images"
and I recognize the six variables as iteration, total loss, loss error, rate, time and number of images, but
I don't know where comes from the percentual mAP.
@Leprechault I don't know what log file you refer to.
mAP is computed in a standard manner, i.e. area under a PR curve.
Okay, on the pre-trained weights, i did transfer learning on cnr parking dataset which has only one class i.e., car and format the data according to the yolo model. in cfg file, i changed the all 3 yolo layers according to this dataset and the no. of filters. after all this, im getting this kind of output. help me out guys.
bounding boxes are shifted by same particular distance, I'm not getting it.
@JalajK your labels look incorrect. You need to check your train_batch0.jpg and test_batch0.jpg produced when training starts.
@Leprechault I don't know what log file you refer to.
mAP is computed in a standard manner, i.e. area under a PR curve.
Thank @glenn-jocher I search my goal in the wrong file. Now, I try to get the mAP results in a txt file using ./darknet detector map obj.data obj.cfg backup/obj_100.weights -map | tee result_mAP.txt
, but doesn't work (no output txt file created). Any ideas?
@Leprechault suggest you raise the issue on the relevant repo.
./darknet detector map
when I use this command I am validating the model right?
but why is
the setting is like this in obj.data I mean why validation data is equal to test data?????? I need an answer please
classes = 2 train = data/train.txt valid = data/test.txt names = data/obj.names backup = backup/
👋 Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:
git clone
version of this repo we can not debug it. Before going further run this code and verify your issue persists:
$ git clone https://github.com/ultralytics/yolov5 yolov5_new # clone latest
$ cd yolov5_new
$ python detect.py # verify detection
- **Your custom data.** If your issue is not reproducible in one of our 3 common datasets ([COCO](https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml), [COCO128](https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml), or [VOC](https://github.com/ultralytics/yolov5/blob/master/data/voc.yaml)) we can not debug it. Visit our [Custom Training Tutorial](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data) for guidelines on training your custom data. Examine `train_batch0.jpg` and `test_batch0.jpg` for a sanity check of your labels and images.
- **Your environment.** If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, verify your environment meets all of the [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies specified below. If in doubt, download Python 3.8.0 from https://www.python.org/, create a new [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/), and install requirements.
If none of these apply to you, we suggest you close this issue and raise a new one using the 🐛 **Bug Report template**, providing screenshots and **minimum viable code to reproduce your issue**. Thank you!
## Requirements
Python 3.8 or later with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies installed, including `torch>=1.7`. To install run:
```bash
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.
I think there is something wrong with the scaling of the bounding boxes. My mAP is always at zero even if my training is going well. Also, when I use detect.py, the bounding boxes are at the right places, but are really small.
I didn't touch anything in util.py and my .txt files for the images are right.