jhc13 / taggui

Tag manager and captioner for image datasets
GNU General Public License v3.0
777 stars 37 forks source link

Stretched images #168

Open geroldmeisinger opened 6 months ago

geroldmeisinger commented 6 months ago

I just noticed this piece of code in cogvlm2.py:

    transform = transforms.Compose([
        transforms.Resize(
            (image_size, image_size),
            interpolation=transforms.InterpolationMode.BICUBIC
        ),
        transforms.ToTensor(),
        transforms.Normalize((0.48145466, 0.4578275, 0.40821073),
                             (0.26862954, 0.26130258, 0.27577711))
    ])

smiley_narrow

> What is the shape of this object?
The shape of the object in the image is an ellipse or an oval. It is elongated horizontally and has a rounded top and bottom.

What the model actually sees is this:

smiley_stretched

...

smiley_full

> What is the shape of this object?
The image features a yellow circular shape with a simple facial expression. 

Now the model only(?) accepts 1344x1344 and there is no perfect solution. But I think there should either be a (textual or visual) hint or an option like (*) stretch ( ) center crop ( ) add borders

jhc13 commented 6 months ago

That's interesting. The code is from the creators of the model, so that's probably how it was trained.