mystic123 / tensorflow-yolo-v3

Implementation of YOLO v3 object detector in Tensorflow (TF-Slim)
https://medium.com/@pawekapica_31302/implementing-yolo-v3-in-tensorflow-tf-slim-c3c55ff59dbe
Apache License 2.0
893 stars 353 forks source link

Should the resized image keep aspect ratio? #30

Closed i3oi3o closed 5 years ago

i3oi3o commented 5 years ago

If I'm wrong, please correct me. This line don't care about aspect ratio. https://github.com/mystic123/tensorflow-yolo-v3/blob/fb9f5439d90ecff255b008e9c9d9e3b8ac4813da/demo.py#L49

This is a letter box function. If run "./darknet detect" seem to call to detector.c's test_detector() function which also use letter box function. The padding use 128 (uint) or 0.5 (float) value.

image letterbox_image(image im, int w, int h)
{
    int new_w = im.w;
    int new_h = im.h;
    if (((float)w/im.w) < ((float)h/im.h)) {
        new_w = w;
        new_h = (im.h * w)/im.w;
    } else {
        new_h = h;
        new_w = (im.w * h)/im.h;
    }
    image resized = resize_image(im, new_w, new_h);
    image boxed = make_image(w, h, im.c);
    fill_image(boxed, .5);
    //int i;
    //for(i = 0; i < boxed.w*boxed.h*boxed.c; ++i) boxed.data[i] = 0;
    embed_image(resized, boxed, (w-new_w)/2, (h-new_h)/2); 
    free_image(resized);
    return boxed;
}

This is resize function which use bi-linear interpolation. This is used in detector.c's validate_detector_recall() function.

image resize_image(image im, int w, int h)
{
    image resized = make_image(w, h, im.c);   
    image part = make_image(w, im.h, im.c);
    int r, c, k;
    float w_scale = (float)(im.w - 1) / (w - 1);
    float h_scale = (float)(im.h - 1) / (h - 1);
    for(k = 0; k < im.c; ++k){
        for(r = 0; r < im.h; ++r){
            for(c = 0; c < w; ++c){
                float val = 0;
                if(c == w-1 || im.w == 1){
                    val = get_pixel(im, im.w-1, r, k);
                } else {
                    float sx = c*w_scale;
                    int ix = (int) sx;
                    float dx = sx - ix;
                    val = (1 - dx) * get_pixel(im, ix, r, k) + dx * get_pixel(im, ix+1, r, k);
                }
                set_pixel(part, c, r, k, val);
            }
        }
    }
    for(k = 0; k < im.c; ++k){
        for(r = 0; r < h; ++r){
            float sy = r*h_scale;
            int iy = (int) sy;
            float dy = sy - iy;
            for(c = 0; c < w; ++c){
                float val = (1-dy) * get_pixel(part, c, iy, k);
                set_pixel(resized, c, r, k, val);
            }
            if(r == h-1 || im.h == 1) continue;
            for(c = 0; c < w; ++c){
                float val = dy * get_pixel(part, c, iy+1, k);
                add_pixel(resized, c, r, k, val);
            }
        }
    }

    free_image(part);
    return resized;
}

Now the question, Should the resized image keeping aspect ratio? This should improve its accuracy, shouldn't it?

mystic123 commented 5 years ago

Yeah, you are right. Could you please fix this and submit PR?

i3oi3o commented 5 years ago

Sure, I will try PR next weekend. By the way, My PC's GPU is very old, You will need to manual test it. Don't want to use my workplace PC.

sagarkar10 commented 5 years ago

The letterbox function is fine, but the detections are drawn out of place, where should i change back the position @i3oi3o

i3oi3o commented 5 years ago

The pull request #49 have already been accepted. So, I will close this issue. It can be improve further by moving the letter box logic to Tensorflow side, GPU is very good with this kind of thing. But let create another issue for that.