rodrigo2019 / keras_yolo2

MIT License
46 stars 15 forks source link

Is the anchor generation correct? #17

Open stephandooper opened 4 years ago

stephandooper commented 4 years ago

Whilst browsing through the gen_anchors.py file I noticed a discrepancy between the anchors that are calculated in the gen_anchors.py and the preprocessing.py files. An example:

Assume that all of the images are of the same width and height, and that they match the desired width/height specified in the config.json.

Also, for simplicity assume that the width/height of our images are all 300x300, and that the network outputs a shape of 10x10 (the grid)

finally, assume that we are tracking a specific image with xmin=0, xmax = 90, ymin=0, ymax = 90.

Then in gen_anchors.py we can see the following code:

 input_size = (config['model']['input_size_h'], config['model']['input_size_w'], 3)
    feature_extractor = import_feature_extractor(config['model']['backend'], input_size)
    grid_w = config['model']['input_size_w']/feature_extractor.get_output_shape()[1]
    grid_h = config['model']['input_size_h']/feature_extractor.get_output_shape()[0]

    # run k_mean to find the anchors
    annotation_dims = []
    for image in train_imgs:
        cell_w = image['width']/grid_w
        cell_h = image['height']/grid_h

        for obj in image['object']:
            relative_w = (float(obj['xmax']) - float(obj['xmin']))/cell_w
            relatice_h = (float(obj["ymax"]) - float(obj['ymin']))/cell_h
            annotation_dims.append(tuple(map(float, (relative_w, relatice_h))))

with out assumptions, the grid_h, grid_w = 300 / 10 = 30 pixels, i.e a grid cell equals 30 pixels then, cell_w, cell_h = 300 / 30 = 10, i.e. back to a 'grid' finally, relative_w/h = (90 - 0) / 10 = 9 -> "how many grid blocks is the width?" , which is the final unit in which the anchors are also later on clustered and returned.

In preprocessing.py, we can find the following calculation for the width and height:

center_w = (obj['xmax'] - obj['xmin']) / (
                                    float(self._config['IMAGE_W']) / self._config['GRID_W'])
center_h = (obj['ymax'] - obj['ymin']) / (
                                    float(self._config['IMAGE_H']) / self._config['GRID_H'])

xmin, xmax are the same as before, IMAGE_W, IMAGE_H = 300,300, and the GRID is obtained from frontend.py directly from the feature extractor, i.e.:

self._grid_h, self._grid_w = self._feature_extractor.get_output_shape()

which in itself was assumed to be a 10x10 grid. Using these values, we can obtain the following values for the centers:

center_w/h = (90-0) / (300/ 10) = 90/30 = 3 ->"how many blocks of size 30 do we need to fit the width?"

As we can see, the widths and heights between the precomputed anchors, and the calculated widths and heights, do not correspond, since 3 != 9, due to a difference in the way they are measured.

My question is whether I am mistaken somewhere, or if there is potentially a bug in the code?

rodrigo2019 commented 4 years ago

Could you open a PR to fix this issue? :)

rodrigo2019 commented 4 years ago

Also take a look here here. I checked that in actual method (master branch) the objects are overwrite sometimes, In the dev branch I fixed it, but i'm not sure if it is working correctly

This case is has relation with the anchors too

stephandooper commented 4 years ago

I will try to make a pull request at the end of this week. It should not be too hard but I have to extend it to the case when the input images are of variable size

I have also taken a look at the dev branch you linked to, but I do not quite understand what you mean when you say the objects are sometimes overwritten, could you elaborate?

rodrigo2019 commented 4 years ago

@stephandooper because the batchgenerator choose the best anchor position for that object, but sometimes a object already fulfilled that position in the yolo output, in this cases I think that the next object should be in the second best position for it

sbillin commented 4 years ago

I believe the problem in gen_anchors.py is that grid_w should be the number of grid cells rather than the size of a single grid cell.

The change below fixes the issue and computes the anchor size in terms of grid units, corresponding with the computations in preprocessing.py. Also added float(...) to the cell_w line for consistency in the computed result for python2 vs python3.

    input_size = (config['model']['input_size_h'], config['model']['input_size_w'], 3)
    feature_extractor = import_feature_extractor(config['model']['backend'], input_size)
    # grid_w should be the number of grid cells, not the size of a single grid cell
    #grid_w = config['model']['input_size_w']/feature_extractor.get_output_shape()[1]
    #grid_h = config['model']['input_size_h']/feature_extractor.get_output_shape()[0]
    grid_w = feature_extractor.get_output_shape()[1]
    grid_h = feature_extractor.get_output_shape()[0]

    # run k_mean to find the anchors
    annotation_dims = []
    for image in train_imgs:
        cell_w = float(image['width'])/grid_w
        cell_h = float(image['height'])/grid_h

        for obj in image['object']:
            relative_w = (float(obj['xmax']) - float(obj['xmin']))/cell_w
            relatice_h = (float(obj["ymax"]) - float(obj['ymin']))/cell_h
            annotation_dims.append(tuple(map(float, (relative_w, relatice_h))))
ghost commented 3 years ago

Also take a look here here. I checked that in actual method (master branch) the objects are overwrite sometimes, In the dev branch I fixed it, but i'm not sure if it is working correctly

This case is has relation with the anchors too

hi brother, whenever I have tried to run this code, I got "Insert a comment about this training". What should I need to give input here ?