ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.16k stars 3.44k forks source link

mAP and detection not working #197

Closed Ownmarc closed 5 years ago

Ownmarc commented 5 years ago

I think there is something wrong with the scaling of the bounding boxes. My mAP is always at zero even if my training is going well. Also, when I use detect.py, the bounding boxes are at the right places, but are really small.

I didn't touch anything in util.py and my .txt files for the images are right.

glenn-jocher commented 5 years ago

Git clone a clean copy of the repo and run one of the custom tutorials. If your results match ours then its your data.

https://docs.ultralytics.com/yolov5/tutorials/train_custom_data python3 train.py --data data/coco_10img.data

You should see this. The 10 image example only takes about 5 minutes on a GCP VM V100 instance. from utils import utils; utils.plot_results() results

Ownmarc commented 5 years ago

results

Guess I am overfitting, labels are fine when I open them with the open source project labelimg

I had 21 classes, 450 images and ~50 objects per image

glenn-jocher commented 5 years ago

Before you do any training, an obvious first step is to run a tutorial and make sure your results match.

If you are overfitting your mAP on the train group should be great, right? Have you checked that at least?

In any case, 99% of the times people can't get results they didn't format their data correctly or they've modified the default repository.

Ownmarc commented 5 years ago

Been trying to run it, but I can't get all the files right in windows, can't run the .sh file to set everything up.

I double checked my annotation data and everything is just like the yolo annotations. I tought my problem was the learning rate or the augmentation and did try several things to make it work tonight without any luck.

Would you mind trying my data to see if you can get something out of it ? Would be appreciated and maybe you'll be able to add guidance to the custom tutorial.

Ownmarc commented 5 years ago

After searching the official Darknet repo, I think this may have something to do with the anchors.. I probably need to change them for my custom data

glenn-jocher commented 5 years ago

If your target sizes are different enough than the default anchors then yes you will want to vary the anchor dimensions. We used kmeans to do this with the xView data: https://github.com/ultralytics/xview-yolov3

You can run under linux using your GCP quickstart: https://docs.ultralytics.com/yolov5/environments/google_cloud_quickstart_tutorial/

glenn-jocher commented 5 years ago

Also, to make sure your targets are in the right format, you can plot the training data by using the plotting script in train.py.

About your anchors, I'd be very surprised if the smallest or largest anchors weren't covering part of your training data. They span from 10 to 370 pixels wide in a 416 size image. Changing anchors is done to improve results, not to bring them the mAP from zero to something else. I still think there must be a problem elsewhere.

https://github.com/ultralytics/yolov3/blob/11366774e2a821dfcc281ee800b68141d989344f/train.py#L129-L139 batch_0

glenn-jocher commented 5 years ago

@Ownmarc good news maybe. I was posting a comment on a different issue when I realized we had inadvertantly introduced a bug in the master branch related to wh loss computation. This was fixed in our test branch but not the master. I fixed this and also hardcoded plotting of the first train and test batches. When you train normally now, two files will appear in your yolov3/ directory, train_batch0.jpg and test_batch0.jpg.

You should git pull to incorporate the wh bug fix, and retrain, viewing the two images to make sure the boxes seem correctly aligned. I will add this tidbit to the tutorials as well, this should go far in helping ppl make sure their training and testing data is well formatted.

Ownmarc commented 5 years ago

Just checked your commit, it makes alot of sense since my training was getting worst the more I trained and it looked like a the loss on yolo layers wasn’t conputing correctly. I’ll keep you updated!

Ownmarc commented 5 years ago

train batch, everything looks normal: train_batch0

Ownmarc commented 5 years ago

Yep! Thanks alot, mAP is showing and increasing! I think we can close this

image

glenn-jocher commented 5 years ago

Hmm, it must have been that wh bug. Phew, we have to be careful here when we adjust the code. Ok, glad to hear its all working now!! I hope other people aren't running into the same problem. Probably leave open for a few days just in case anyone goes searching.

If anyone has training problems on custom data, please git pull the latest commit and try again, as a bug was present around the first week of April that has now been resolved!

glenn-jocher commented 5 years ago

@Ownmarc hey wait a second, your screenshot is showing Recall > 1 for several categories, which is a statistical impossibility. The high recall seems to be feeding to the mAP as well, causing it to increase above 1 for the same categories.

We validated our mAP against pycocotools and darknet very well, and now it matches to 1%. I just recomputed for another issue: https://github.com/ultralytics/yolov3/issues/199#issuecomment-481216891

Do you know what might be causing this?

Ownmarc commented 5 years ago

It seems to take into account 1 object that is predicted with 2 bboxes almost on top of each other as 2 good predictions when there is, in fact, only 1 object!

Ownmarc commented 5 years ago

This is at 0.7 conf threshold, see this cannon having 2 bbox. They are probably counted as 2 good detections.

image

Here we can see the cannon class at 1.01:

image

glenn-jocher commented 5 years ago

Hmm. This is surely the finest test.py result I've ever seen.

It's pretty common to get two boxes for one object, that should just give you a P of 0.5 and an R of 1.0 for that instance.

Somehow your list of TPs is greater than the list of target objects, which should not be possible. In any case, it looks like the issue mellowed out eventually. I scanned the test.py code but didn't see anything out of the ordinary. Since this doesn't occur on COCO data I'll just forget about it for now.

Ownmarc commented 5 years ago

@glenn-jocher, let me know if you want my dataset to test it!

glenn-jocher commented 5 years ago

@Ownmarc maybe if you put it all in a Google Drive folder I can check it out when I have more free time! It would certainly be interesting to see what's causing the > 1 recalls.

Do you think you could have duplicate rows in your labels file? Is it still there at the default test parameters, i.e.nms_thres 0.5?

Ownmarc commented 5 years ago

Yes, didn't change anything from master repo other then the init.py I need in the util folder (for windows), the font size of the plotting and setting visible gpu to the train script.

No duplicates, they were all made using a script from xml files which were hand annotated and checked using other scripts to make sure there was nothing impossible (like 8 gold_mines since a player can only have 7 maximum)

I have been training using darkflow and the exact same dataset and this was not hapenning. Maybe this can help you (from Darkflow repo):

import numpy as np

class BoundBox:
    def __init__(self, classes):
        self.x, self.y = float(), float()
        self.w, self.h = float(), float()
        self.c = float()
        self.class_num = classes
        self.probs = np.zeros((classes,))

def overlap(x1,w1,x2,w2):
    l1 = x1 - w1 / 2.;
    l2 = x2 - w2 / 2.;
    left = max(l1, l2)
    r1 = x1 + w1 / 2.;
    r2 = x2 + w2 / 2.;
    right = min(r1, r2)
    return right - left;

def box_intersection(a, b):
    w = overlap(a.x, a.w, b.x, b.w);
    h = overlap(a.y, a.h, b.y, b.h);
    if w < 0 or h < 0: return 0;
    area = w * h;
    return area;

def box_union(a, b):
    i = box_intersection(a, b);
    u = a.w * a.h + b.w * b.h - i;
    return u;

def box_iou(a, b):
    return box_intersection(a, b) / box_union(a, b);

def prob_compare(box):
    return box.probs[box.class_num]

def prob_compare2(boxa, boxb):
    if (boxa.pi < boxb.pi):
        return 1
    elif(boxa.pi == boxb.pi):
        return 0
    else:
        return -1
Ownmarc commented 5 years ago

Test loss seems to be good, here is the result.txt if it can help (21 classes) :

image

vivian-wong commented 5 years ago

I was running into the same issue, but after pulling the repo again this morning, I still couldn't get mAP to not be zero. But even weirder is that my wh becomes inf after a while.

image

I am training with transfer learning on a custom dataset with 1 class.

glenn-jocher commented 5 years ago

@vivian-wong see https://github.com/ultralytics/yolov3/issues/168 to control divergent width-height (wh) losses.

vivian-wong commented 5 years ago

Working now! Thank you!

shadyatscu commented 5 years ago

Working now! Thank you!

cong!

Ownmarc commented 5 years ago

@glenn-jocher, was the mAP over 1.0 fixed or should we open a new issue ?

glenn-jocher commented 5 years ago

@Ownmarc > 1 recall is likely still an open issue as I have not worked on it due to an inability to rapidly reproduce it. Another user mentioned it as well. The darkflow iou code is nice to see, but their code only operates on one box at a time, whereas ours is vectorized for speed (computes many ious simultaneously). In any case, I don't think IOU is the problem.

If you raise a new issue specifically about the > 1 recall, make sure you supply all the elements to reproduce the issue, i.e. a google drive folder with the trained model, the .data and .cfg files, and the *.txt file pointing to the training images and labels folders, and of course the folders themselves. This would be the most useful.

AntoineGerardeaux commented 5 years ago

Good morning everyone,

hi @Ownmarc can you share me your weight, cfg and .data .names from your "clash of clans" detector. I realy wanna test it, maybe make a youtube video.

Best regards, Antoine

glenn-jocher commented 5 years ago

@Ownmarc mAP > 1.0 has been fixed now.

Ownmarc commented 5 years ago

@glenn-jocher cool, what was the issue ? Thanks

glenn-jocher commented 5 years ago

@Ownmarc just a bug in the test.py mAP calculation. Non-exclusive target-anchor combinations were being allowed which caused some targets to count as multiple TPs.

JalajK commented 4 years ago

can anyone help me, i performed transfer learning on yolov3 and its detecting objects but not at the right place. cant upload the result because using company's network.

glenn-jocher commented 4 years ago

@JalajK hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Leprechault commented 4 years ago

Git clone a clean copy of the repo and run one of the custom tutorials. If your results match ours then its your data.

https://docs.ultralytics.com/yolov5/tutorials/train_custom_data python3 train.py --data data/coco_10img.data

You should see this. The 10 image example only takes about 5 minutes on a GCP VM V100 instance. from utils import utils; utils.plot_results() results

in the log.txt output what's the information about mAP?

glenn-jocher commented 4 years ago

@Leprechault from utils import utils; utils.plot_results() to plot your mAP.

Leprechault commented 4 years ago

Thanks, @glenn-jocher !!! Works, but I have a conceptual question too about the way that mAP is calculated. Because in the output file (log.txt) after each iteration a have e.g. "1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images" and I recognize the six variables as iteration, total loss, loss error, rate, time and number of images, but I don't know where comes from the percentual mAP.

glenn-jocher commented 4 years ago

@Leprechault I don't know what log file you refer to.

mAP is computed in a standard manner, i.e. area under a PR curve.

JalajK commented 4 years ago

Okay, on the pre-trained weights, i did transfer learning on cnr parking dataset which has only one class i.e., car and format the data according to the yolo model. in cfg file, i changed the all 3 yolo layers according to this dataset and the no. of filters. after all this, im getting this kind of output. help me out guys. IMG-20200316-WA0007

bounding boxes are shifted by same particular distance, I'm not getting it.

glenn-jocher commented 4 years ago

@JalajK your labels look incorrect. You need to check your train_batch0.jpg and test_batch0.jpg produced when training starts.

Leprechault commented 4 years ago

@Leprechault I don't know what log file you refer to.

mAP is computed in a standard manner, i.e. area under a PR curve.

Thank @glenn-jocher I search my goal in the wrong file. Now, I try to get the mAP results in a txt file using ./darknet detector map obj.data obj.cfg backup/obj_100.weights -map | tee result_mAP.txt, but doesn't work (no output txt file created). Any ideas?

glenn-jocher commented 4 years ago

@Leprechault suggest you raise the issue on the relevant repo.

Fetulhak commented 3 years ago

./darknet detector map

when I use this command I am validating the model right?

but why is

the setting is like this in obj.data I mean why validation data is equal to test data?????? I need an answer please

classes = 2 train = data/train.txt valid = data/test.txt names = data/obj.names backup = backup/

glenn-jocher commented 3 years ago

👋 Hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

CODE TO REPRODUCE YOUR ISSUE HERE


- **Your custom data.** If your issue is not reproducible in one of our 3 common datasets ([COCO](https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml), [COCO128](https://github.com/ultralytics/yolov5/blob/master/data/coco128.yaml), or [VOC](https://github.com/ultralytics/yolov5/blob/master/data/voc.yaml)) we can not debug it. Visit our [Custom Training Tutorial](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data) for guidelines on training your custom data. Examine `train_batch0.jpg` and `test_batch0.jpg` for a sanity check of your labels and images.

- **Your environment.** If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, verify your environment meets all of the [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies specified below. If in doubt, download Python 3.8.0 from https://www.python.org/, create a new [venv](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/), and install requirements.

If none of these apply to you, we suggest you close this issue and raise a new one using the 🐛 **Bug Report template**, providing screenshots and **minimum viable code to reproduce your issue**. Thank you!

## Requirements

Python 3.8 or later with all [requirements.txt](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) dependencies installed, including `torch>=1.7`. To install run:
```bash
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.