zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.2k stars 1.27k forks source link

Another newbie question: how to select anchors for custom dataset #27

Closed Cli98 closed 4 years ago

Cli98 commented 4 years ago

Hi author @zylo117 ,

Thank you for your help in my previous question. Yet I still need help for selection of anchors. How to do this for customer dataset if I find out the detection performance is not satisfying?

I know some practices such as anchor selection in Yolo family. Do you have more resources to help me on this topic?

Thank you again,

zylo117 commented 4 years ago

Kmeans. Just like yolo's. but the calulation is not provided in this repo. Implement kmeans should be easy.

Then change these lines of project.yml.

anchors_scales: '[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]'
anchors_ratios: '[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]'

These two array will multiple with each other, hence a 3x3x2 array, meaning 9 anchors with width scaling factor and height scaling factor.

Examples,

This is what happens when you set anchors_ratios to [(1., 5.), (1., 6.), (1., 10.)] img_inferred_d0_this_repo_0

The first parameter is width scaling factor and the other is height scaling factor.

Here is what you need to do.

  1. Find out 9 anchors ratio with kmeans.
  2. Decompose it into two 3x1 arrays, one is anchors_scales, the other is anchors_ratios_raw.
  3. Decompose anchors_ratios_raw into 3x2 (width/height factor) and 3x1 arrays (base factor). In this case, it's [[1,1], [2,1], [1,2]] and [1, 0.7, 0.7]
Cli98 commented 4 years ago

Thanks for your kindly explanation.

phucnhs commented 4 years ago

Sorry, I have a simple question, how can I decompose into the regular two sets of numbers after clustering to get the anchor width and height of my dataset and their aspect ratio,sorry i've never done it before image

I am having the same problem, have you solved it yet

yulei1234 commented 4 years ago

抱歉,我有一个简单的问题,在聚类后如何分解为规则的两组数字以获取数据集的锚点宽度和高度及其纵横比,抱歉,我之前从未做过 图片

我遇到了同样的问题,您解决了吗

sorry, I don't know how to do it yet

XiaoLaoDi commented 4 years ago

Kmeans. Just like yolo's. but the calulation is not provided in this repo. Implement kmeans should be easy.

Then change these lines of project.yml.

anchors_scales: '[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]'
anchors_ratios: '[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]'

These two array will multiple with each other, hence a 3x3x2 array, meaning 9 anchors with width scaling factor and height scaling factor.

Examples,

This is what happens when you set anchors_ratios to [(1., 5.), (1., 6.), (1., 10.)] img_inferred_d0_this_repo_0

The first parameter is width scaling factor and the other is height scaling factor.

Here is what you need to do.

  1. Find out 9 anchors ratio with kmeans.
  2. Decompose it into two 3x1 arrays, one is anchors_scales, the other is anchors_ratios_raw.
  3. Decompose anchors_ratios_raw into 3x2 (width/height factor) and 3x1 arrays (base factor). In this case, it's [[1,1], [2,1], [1,2]] and [1, 0.7, 0.7]

@zylo117 Great job. Could you explain Decomposition of step 2 and step 3 in detail or in concrete example, so confused with that?

rm2886 commented 4 years ago

Kmeans. Just like yolo's. but the calulation is not provided in this repo. Implement kmeans should be easy.

Then change these lines of project.yml.

anchors_scales: '[2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]'
anchors_ratios: '[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]'

These two array will multiple with each other, hence a 3x3x2 array, meaning 9 anchors with width scaling factor and height scaling factor.

Examples,

This is what happens when you set anchors_ratios to [(1., 5.), (1., 6.), (1., 10.)] img_inferred_d0_this_repo_0

The first parameter is width scaling factor and the other is height scaling factor.

Here is what you need to do.

  1. Find out 9 anchors ratio with kmeans.
  2. Decompose it into two 3x1 arrays, one is anchors_scales, the other is anchors_ratios_raw.
  3. Decompose anchors_ratios_raw into 3x2 (width/height factor) and 3x1 arrays (base factor). In this case, it's [[1,1], [2,1], [1,2]] and [1, 0.7, 0.7]

@zylo117 it's a nice explanation but could you please help me with decomposition part in step 2 and 3. I have calculated 9 anchors using kmeans. 11.23,0.33, 11.80,6.40, 12.63,3.48, 12.71,2.29, 12.71,0.23, 12.73,0.69, 12.73,1.55, 12.74,1.05, 12.81,0.46 now I don't know how to use these anchors and decompose them to anchor ratio and anchor scales.

Thanks

lucasjinreal commented 4 years ago

@zylo117 Can u elebrate how to Decompose? Say we have there ratios generated by kmeans:

 [0.24, 0.24, 0.24, 1.18, 1.18, 1.22]
Cli98 commented 4 years ago

@zylo117 Can u elebrate how to Decompose? Say we have there ratios generated by kmeans:

 [0.24, 0.24, 0.24, 1.18, 1.18, 1.22]

@jinfagang This looks quite weird. Normally we set 3 X 3 groups, but you set a 6 X 6 groups. And the first 3 have same ratio. How you get those ratios?

lucasjinreal commented 4 years ago

Am runing kmeans and get boxes and ratios like this, set 6 clusters for kmeans, 9x9 for yolo perhaps? :

Accuracy: 67.96%
Boxes:
 [[0.0109375  0.00925926]
 [0.009375   0.03888889]
 [0.00520833 0.0212963 ]
 [0.01927083 0.08148148]
 [0.0453125  0.03703704]
 [0.02083333 0.01759259]]
Ratios:
 [0.24, 0.24, 0.24, 1.18, 1.18, 1.22]

How to makes it equals to this repo?

lucasjinreal commented 4 years ago

this is the code for kmeans:

import glob
import xml.etree.ElementTree as ET

import numpy as np

import numpy as np

def iou(box, clusters):
    """
    Calculates the Intersection over Union (IoU) between a box and k clusters.
    :param box: tuple or array, shifted to the origin (i. e. width and height)
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: numpy array of shape (k, 0) where k is the number of clusters
    """
    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")

    intersection = x * y
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_

def avg_iou(boxes, clusters):
    """
    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: average IoU as a single float
    """
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])

def translate_boxes(boxes):
    """
    Translates all the boxes to the origin.
    :param boxes: numpy array of shape (r, 4)
    :return: numpy array of shape (r, 2)
    """
    new_boxes = boxes.copy()
    for row in range(new_boxes.shape[0]):
        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
    return np.delete(new_boxes, [0, 1], axis=1)

def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    """
    rows = boxes.shape[0]

    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))

    np.random.seed()

    # the Forgy method will fail if the whole array contains the same rows
    clusters = boxes[np.random.choice(rows, k, replace=False)]

    while True:
        for row in range(rows):
            distances[row] = 1 - iou(boxes[row], clusters)

        nearest_clusters = np.argmin(distances, axis=1)

        if (last_clusters == nearest_clusters).all():
            break

        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        last_clusters = nearest_clusters

    return clusters

ANNOTATIONS_PATH = "Annotations"
CLUSTERS = 6

def load_dataset(path):
    dataset = []
    for xml_file in glob.glob("{}/*xml".format(path)):
        tree = ET.parse(xml_file)

        height = int(float(tree.findtext("./size/height")))
        width = int(float(tree.findtext("./size/width")))

        for obj in tree.iter("object"):
            xmin = int(float(obj.findtext("bndbox/xmin"))) / width
            ymin = int(float(obj.findtext("bndbox/ymin"))) / height
            xmax = int(float(obj.findtext("bndbox/xmax"))) / width
            ymax = int(float(obj.findtext("bndbox/ymax"))) / height

            dataset.append([xmax - xmin, ymax - ymin])

    return np.array(dataset)

data = load_dataset(ANNOTATIONS_PATH)
out = kmeans(data, k=CLUSTERS)
print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))
print("Boxes:\n {}".format(out))

ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()
print("Ratios:\n {}".format(sorted(ratios)))

load annoations from VOC format dataset.

Cli98 commented 4 years ago

Am runing kmeans and get boxes and ratios like this, set 6 clusters for kmeans, 9x9 for yolo perhaps? :

Accuracy: 67.96%
Boxes:
 [[0.0109375  0.00925926]
 [0.009375   0.03888889]
 [0.00520833 0.0212963 ]
 [0.01927083 0.08148148]
 [0.0453125  0.03703704]
 [0.02083333 0.01759259]]
Ratios:
 [0.24, 0.24, 0.24, 1.18, 1.18, 1.22]

How to makes it equals to this repo?

That's not the same purpose.

lucasjinreal commented 4 years ago

@Cli98 what do u mean not same purpose? than how to get my dataset anchor settings?

markgao-916 commented 4 years ago

Am runing kmeans and get boxes and ratios like this, set 6 clusters for kmeans, 9x9 for yolo perhaps? :

Accuracy: 67.96%
Boxes:
 [[0.0109375  0.00925926]
 [0.009375   0.03888889]
 [0.00520833 0.0212963 ]
 [0.01927083 0.08148148]
 [0.0453125  0.03703704]
 [0.02083333 0.01759259]]
Ratios:
 [0.24, 0.24, 0.24, 1.18, 1.18, 1.22]

How to makes it equals to this repo?

根据得到的ratios,取倒数就是作者代码中的ratios了,仅是自己的想法,因为anchor的面积不变,作者0.7*1.4=0.98

whut2962575697 commented 4 years ago

Am runing kmeans and get boxes and ratios like this, set 6 clusters for kmeans, 9x9 for yolo perhaps? :

Accuracy: 67.96%
Boxes:
 [[0.0109375  0.00925926]
 [0.009375   0.03888889]
 [0.00520833 0.0212963 ]
 [0.01927083 0.08148148]
 [0.0453125  0.03703704]
 [0.02083333 0.01759259]]
Ratios:
 [0.24, 0.24, 0.24, 1.18, 1.18, 1.22]

How to makes it equals to this repo?

根据得到的ratios,取倒数就是作者代码中的ratios了,仅是自己的想法,因为anchor的面积不变,作者0.7*1.4=0.98

hello, do you know how to makes it equals to this repo now?

markgao-916 commented 4 years ago

以ratios的第一个0.24为例,两个值分别为0.24和1/0.24,,scale需要根据实际的gt来调了,我也不知道有没有其他计算方式,作者说的我也没太明白(我只是想用中文,不想用英文)

Cli98 commented 4 years ago

以ratios的第一个0.24为例,两个值分别为0.24和1/0.24,,scale需要根据实际的gt来调了,我也不知道有没有其他计算方式,作者说的我也没太明白(我只是想用中文,不想用英文)

@markgao-916 你得看一下源代码,那个和乘积是1关系不大,起码你引用的code用的不是那个

markgao-916 commented 4 years ago

@Cli98 anchor在设置的时候,假如一个位置上,第一个1:1的框很容易确定,那另外两个框的长宽比如果按照1:2和2:1设置,那这三个框的面积是不是相等的呢?

Cli98 commented 4 years ago

@Cli98 anchor在设置的时候,假如一个位置上,第一个1:1的框很容易确定,那另外两个框的长宽比如果按照1:2和2:1设置,那这三个框的面积是不是相等的呢?

新手看见你这么解释肯定说不相等好吧,面积相等是建立在你怎么处理base_size和ratio的乘积。你说的当然也是一种选项,那些是固定base_size去调anchor_ratio.但是怎么选anchor并不是只有一种方法,比如你现在用的这个repo,就没有强制面积恒定,当然我承认主流框架确实是有面积为1的要求的,比如Faster-Rcnn-Fpn

markgao-916 commented 4 years ago

@Cli98 我的设置方式是通过计算倒数,那面积我相信一定是1,具体最终框的大小,源代码里我看到了scale可以调整框的大小

Cli98 commented 4 years ago

@Cli98 我的设置方式是通过计算倒数,那面积我相信一定是1,具体最终框的大小,源代码里我看到了scale可以调整框的大小

你的理解从faster-rcnn那篇论文来说没问题,但是你引用的那段代码不是你这么做的,我只是指出这一点。你自己非要强行相信那随便了

markgao-916 commented 4 years ago

@Cli98 我的设置方式是通过计算倒数,那面积我相信一定是1,具体最终框的大小,源代码里我看到了scale可以调整框的大小

你的理解从faster-rcnn那篇论文来说没问题,但是你引用的那段代码不是你这么做的,我只是指出这一点。你自己非要强行相信那随便了 我也说了只是自己的一点想法,就是交流

jhonjam commented 3 years ago

Estoy ejecutando kmeans y obtengo cuadros y proporciones como esta, ¿establecer 6 grupos para kmeans, 9x9 para yolo quizás? :

Accuracy: 67.96%
Boxes:
 [[0.0109375  0.00925926]
 [0.009375   0.03888889]
 [0.00520833 0.0212963 ]
 [0.01927083 0.08148148]
 [0.0453125  0.03703704]
 [0.02083333 0.01759259]]
Ratios:
 [0.24, 0.24, 0.24, 1.18, 1.18, 1.22]

¿Cómo lo iguala a este repositorio?

hi, you were able to find the anchors_scales and anchors_ratios of your dataset?, I also have the same problem

jhonjam commented 3 years ago

Lo siento, tengo una pregunta simple, ¿cómo puedo descomponerme en los dos conjuntos de números normales después de la agrupación para obtener el ancho y alto del ancla de mi conjunto de datos y su relación de aspecto? Lo siento, nunca lo he hecho antes. imagen

Tengo el mismo problema, ¿lo has resuelto todavía?

hi, you were able to find the anchors_scales and anchors_ratios of your dataset?, I also have the same problem

I have the following:

Boxes: [[0.021875 0.06944444] [0.05898437 0.05555556] [0.0421875 0.07777778] [0.021875 0.03611111] [0.3828125 0.4 ] [0.06328125 0.09861111] [0.10195312 0.13472222] [0.16484375 0.20694444] [0.034375 0.05 ]] Ratios: [0.31, 0.54, 0.61, 0.64, 0.69, 0.76, 0.8, 0.96, 1.06]

zylo117 commented 3 years ago

@jhonjam Or you can try this one. That repo really helps me a lot. https://github.com/mnslarcher/kmeans-anchors-ratios

jhonjam commented 3 years ago

@zylo117 thanks for your response, I run that code, and I can find the anchors_ratios, however, the results when I run !python coco_eval.py -c 0 -p .... they are really bad, I trained the model for more than 400 epochs and in mAP is very low. I am using images with a resolution of 1280x720 and the efficientDet-D0 model, I am not sure what error I am making, or if for this resolution I have to use a deeper model? Also, I understand that the resolution of the images is decreased to 512x512 in the training stage, as well as it is done in YOLOv3 at a resolution of 416x416.

below I present some results:

Capturar

Cli98 commented 3 years ago

@jhonjam Or you can try this one. That repo really helps me a lot. https://github.com/mnslarcher/kmeans-anchors-ratios

@zylo117 That one basically builds from my code and do some vectorization lol. Maybe also mention my repo next time?

zylo117 commented 3 years ago

@jhonjam So you've done validation in 10 seconds? I assume you have only a few images to be trained? But that's not realistic for training effdet.

jhonjamttt commented 3 years ago

@zylo117 Hello, I have followed all the recommendations that you have given, I found the anchors ratios with https://github.com/mnslarcher/kmeans-anchors-ratios, however, the results are very bad, I am using 5000 images to detect 6 classes.

I was looking at the resolution of the images used at https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/tree/master/tutorial, all the examples used here use images with 512x512 pixels. So the big question for me is: to find the right anchor ratios for my dataset, do I have to lower the resolution of my images? By default my images are 1280x720 pixels in size and I want to use EfficienDet-D0. but the labels are made with that resolution.

1) What should I do, lower the resolution of the images and find anchor ratios? 2) It is possible to find the anchor ratios using images with 128x720 resolution to train EfficienDet-D0?

zylo117 commented 3 years ago

@jhonjamttt unless the ratio or scale of your boxes is very different from the coco dataset, you don't need to manually modify the anchors. Increase your lr by 10 times to see if the loss drops.

jhonjamttt commented 3 years ago

@zylo117 using the script, the anchors ratios are as follows,

image

but the results are still bad, I did the training with the COCO anchor ratios, but the results were also very bad. I am using a learning rate of 1e-3, I don't understand why my results are so bad.

I would like to know, if my problem is related to the total number of labels, the resolution of my images?

zylo117 commented 3 years ago

@jhonjamttt if you can share your dataset, I might take a look into it

jhonjamttt commented 3 years ago

@zylo117 this is link the my dataset https://drive.google.com/drive/folders/1v6Oph92YttQQh7M12Euh5k_y5RYBhYQn?usp=sharing

tell me if you have any problem opening the link thanks

Cli98 commented 3 years ago

@zylo117 this is link the my dataset https://drive.google.com/drive/folders/1v6Oph92YttQQh7M12Euh5k_y5RYBhYQn?usp=sharing

tell me if you have any problem opening the link thanks

@jhonjamttt Can you try my repo to generate the anchors based on your dataset? Here is the link https://github.com/Cli98/anchor_computation_tool

Hopefully this can help u.

jhonjamttt commented 3 years ago

@Cli98 Hi, thanks for your repository, however, I have a question why you use L (large), because I don't know how to define if my bounding boxes are small, medium or large.

image

Cli98 commented 3 years ago

@Cli98 Hi, thanks for your repository, however, I have a question why you use L (large), because I don't know how to define if my bounding boxes are small, medium or large.

image

@jhonjamttt yes, s,m,l correspond to the scale of the dataset.

For example, you may consider "pedestrian" as small scale but "bus" as large scale instead.

The scale of s,m,l can be referred in coco dataset definition, which are 32,96,>96

jhonjamttt commented 3 years ago

@Cli98 Thanks, I understand what you are saying, but my data set is made up of objects of variable sizes, that is, I have very small objects, medium objects and very large objects. On the other hand, the code that supplies @zylo117 (https://github.com/mnslarcher/kmeans-anchors-ratios), generates the anchors ratios based on the size of the bounding boxes, without the need to specify the size of the objects

Cli98 commented 3 years ago

@Cli98 Thanks, I understand what you are saying, but my data set is made up of objects of variable sizes, that is, I have very small objects, medium objects and very large objects. On the other hand, the code that supplies @zylo117 (https://github.com/mnslarcher/kmeans-anchors-ratios), generates the anchors ratios based on the size of the bounding boxes, without the need to specify the size of the objects

@jhonjamttt That's why your result is bad. For bbox with varies of distribution, you need to group them out by scale and do kmean instead. mnslarcher's one don't support this. Anyway its your choice.

jhonjamttt commented 3 years ago

@Cli98 Yes, the bbox are different sizes, however, with the same dataset, yolov3, yolov4 and centernet have very good and similar results. so what happens with efficiendet is very strange.

Cli98 commented 3 years ago

@Cli98 Yes, the bbox are different sizes, however, with the same dataset, yolov3, yolov4 and centernet have very good and similar results. so what happens with efficiendet is very strange.

@jhonjamttt Also with same anchor? on those models (yolov3, yolov4 and centernet)?

jhonjamttt commented 3 years ago

@jhonjamttt if you can share your dataset, I might take a look into it

@zylo117 hello, did you find something in my data set?

zylo117 commented 3 years ago

Since you didn't give me any val set, it can perform well on train set, like 0.4 mAP It's not converged yet, it should be much better

zylo117 commented 3 years ago

almost 0.5 now. I'll leave it to you. here is the notebook https://colab.research.google.com/drive/1lvEaFFAja--smRD41IlADrB09fHGzcLA?usp=sharing

zylo117 commented 3 years ago

the weights are in the shared folder of yours. btw, it has almost nothing to do with the anchors, only the training method

jhonjamttt commented 3 years ago

@zylo117 Hello, thank you very much for your collaboration, I reached a map of 46, however it is very low. you could make available the anchor ratios you used, or the .YML file; what rate learnig you use?

zylo117 commented 3 years ago

@jhonjamttt they are all in the notebook

jhonjamttt commented 3 years ago

@zylo117 yes, but i don't have access

zylo117 commented 3 years ago

@zylo117 yes, but i don't have access

It should be ok now

Cli98 commented 3 years ago

@zylo117 yes, but i don't have access

It should be ok now

@zylo117 I just wonder why AP small is 0? Any thoughts?

zylo117 commented 3 years ago

most likely that the number of small objs is too small, or the small objects are really to small, less than 32, the model can't learn much from it

jhonjamttt commented 3 years ago

@zylo117 I was looking at other documents where they have used your repository, and I found this article https://www.mdpi.com/2072-4292/12/15/2501/htm, where they get very low results using efficiendet-d0. This is really weird, because in your examples (https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch/tree/master/tutorial) you get a good mAP.