huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.36k stars 26.86k forks source link

Question for implementation of resize in image-classification examples. #18142

Closed DataLama closed 2 years ago

DataLama commented 2 years ago

System Info

Who can help?

Examples:

maintained examples (not research project or legacy): @sgugger, @patil-suraj

Information

Tasks

Reproduction

from typing import Optional from dataclasses import dataclass, field from torchvision.transforms import ( CenterCrop, Compose, Resize, ) from transformers import ( MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING, AutoConfig, AutoFeatureExtractor, )

MODEL_CONFIG_CLASSES = list(MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING.keys()) MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

@dataclass class ModelArguments: """ Arguments pertaining to which model/config/tokenizer we are going to fine-tune from. """ model_name_or_path: str = field( default="google/vit-base-patch16-224-in21k", metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}, ) model_type: Optional[str] = field( default=None, metadata={"help": "If training from scratch, pass a model type from the list: " + ", ".join(MODEL_TYPES)}, ) config_name: Optional[str] = field( default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"} ) cache_dir: Optional[str] = field( default=None, metadata={"help": "Where do you want to store the pretrained models downloaded from s3"} ) model_revision: str = field( default="main", metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."}, ) feature_extractor_name: str = field(default=None, metadata={"help": "Name or path of preprocessor config."}) use_auth_token: bool = field( default=False, metadata={ "help": ( "Will use the token generated when running transformers-cli login (necessary to use this script " "with private models)." ) }, ) ignore_mismatched_sizes: bool = field( default=False, metadata={"help": "Will enable to load a pretrained model whose head dimensions are different."}, )

use defualt model_args

model_args = ModelArguments()

feature_extractor = AutoFeatureExtractor.from_pretrained( model_args.feature_extractor_name or model_args.model_name_or_path, cache_dir=model_args.cache_dir, revision=model_args.model_revision, use_auth_token=True if model_args.use_auth_token else None, )

comment the ToTensor and normalize to check the PIL image.

_val_transforms = Compose( [ Resize(feature_extractor.size), CenterCrop(feature_extractor.size)

ToTensor(),

    # normalize,
]

)


* get sample image
```python
from datasets import load_dataset
ds = load_dataset('imagenet-1k',use_auth_token=True, streaming=True)
im = list(ds['train'].take(1))[0]['image']

Expected behavior

I'm careful to say this because I'm a newbie in the field of vision, but the implementation for resize transformation in the _val_transforms function seems to be wrong in image classification example script.(here and here)

This transform may cut the object in validation step.

...
_val_transforms = Compose(
    [
        Resize(feature_extractor.size),
        CenterCrop(feature_extractor.size),
        ToTensor(),
        normalize,
    ]
)
...

In order to maintain the shape of the object and only change the size of the image, I think the following code is right for _val_transforms function.

...
_val_transforms = Compose(
    [
        Resize((feature_extractor.size, feature_extractor.size)),
        CenterCrop(feature_extractor.size),
        ToTensor(),
        normalize,
    ]
)
...

If I've misunderstood, please feel free to tell me about it.

sgugger commented 2 years ago

cc @NielsRogge and @nateraw

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

LysandreJik commented 2 years ago

cc @amyeroberts as well, as you've been working with similar objects lately :)

amyeroberts commented 2 years ago

Hi @DataLama, thanks for raising the issue.

In this script, the reason for the validation transformations being defined like this and in this order - resize then centre crop - is that we end up with an image of size (feature_extractor.size, feature_extractor.size), but what's shown in the image has the same aspect ratio as the original i.e. the image isn't "squashed".

In your suggestion:

...
_val_transforms = Compose(
    [
        Resize((feature_extractor.size, feature_extractor.size)),
        CenterCrop(feature_extractor.size),
        ToTensor(),
        normalize,
    ]
)
...

the image would be resized to (feature_extractor.size, feature_extractor.size) first, changing the aspect ratio, and CenterCrop(feature_extractor.size) would then not have an effect.

DataLama commented 2 years ago

Hi @amyeroberts, thanks for explanation.

Now I understand what you intended.

I'm closing this issue. the issue has been resolved.