[FEEDBACK] Multi-weight support prototype API

pytorch / vision

Datasets, Transforms and Models specific to Computer Vision

https://pytorch.org/vision

BSD 3-Clause "New" or "Revised" License

16.01k stars 6.92k forks source link

[FEEDBACK] Multi-weight support prototype API #5088

Open datumbox opened 2 years ago

datumbox commented 2 years ago

🚀 Feedback Request

This issue is dedicated for collecting community feedback on the Multi-weight support API. Please review the dedicated article where we describe the API in detail and provide an overview of its features.

We would love to get your thoughts, comments and input in order to finalize the API and include it on the new release of TorchVision.

cc @oke-aditya @frgfm @zhiqwang @xiaohu2015

cceyda commented 2 years ago

I was looking at the example in the article and the one thing that jumped out at me as a potential source of confusion (especially for newcomers) was weights.transforms(). Although weights is just the variable name here, and I understand the need to associate the specific model('s weights) and its transform. I think maybe what I don't like is using the word "weights" for the whole "model" abstraction. One potential way to avoid confusion would be to be more specific. so "PM.ResNet50.ImageNet1K_V1" would be the models interface(?) and 'weights','transforms','meta' would be accessed from there like;

resnet = PM.ResNet50.ImageNet1K_V1
weights = resnet.weights
preprocess = resnet.transforms()
class_ids = resnet.meta["categories"]

And the example would turn into

from torchvision.prototype import models as PM

# Step 1: Initialize model
resnet = PM.ResNet50.ImageNet1K_V1 #Less verbose than ResNet50_Weights also stops the ResNet50_weights confusion for those not using autocomplete IDEs
model = PM.resnet50(weights=resnet.weights) # or maybe just resnet
model.eval()

# Step 2: Initialize the inference transforms
preprocess = resnet.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
prediction = model(batch).squeeze(0).softmax(0)

# Step 4: Use the model and print the predicted category
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = resnet.meta["categories"][class_id] # I like meta attributes
print(f"{category_name}: {100 * score}*%*")

Of course all this is a matter of style/preference, and there might be reasons it was implemented that way. I haven't looked at the code.

Overall I really like this feature, it will surely make life easier ❤️ no more copy pasting transforms, hurray! 🥳

datumbox commented 2 years ago

@cceyda Thanks for taking the time to provide detailed feedback. 😄

Originally we considered going with a proposal similar to what you suggested as it would lead to less verbose names. Unfortunately the name ResNet50 is too close to the name ResNet which is used for the main class of the model. This could confuse users to believe that it's an nn.Module model rather than an Enum. Other options were also considered such as introducing new model interfaces that bundle everything together but unfortunately this approach wouldn't be compatible with the existing model builder methods of TorchVision which we had to maintain for backwards compatibility.

Concerning linking the transforms to the model VS the weights abstraction, one reason we selected the latter is to allow for extra flexibility. The same model variant can be used in different training recipes and produce weights that require different preprocessing transforms. A good real-world example of that in TorchVision is EfficientNet B1, which requires different preprocessing for v1 and v2.

For full transparency, the points you raised are valid and many of them were brought up during development (see discussions at #4937 and #4652). Unfortunately due to the constraints imposed there is no single perfect solution that leads to a concise, consistent, pythonic and Backwards Compatible API. Multiple good solutions exist and it depends on what one optimizes for. Let me know your thoughts.

divideconcept commented 2 years ago

If multiple weights are available per model, it'd be nice if we could programmatically scan through all available weights and automatically associate them with their corresponding model function (ie: lowercase "resnet50"). It could be through a weights dictionary similar to the model_urls dictionary for instance, whose keys are exactly the name of the corresponding model's function.

On top of that, each model function should clearly document to the user what weights are available:

def resnet50(*, weights: Optional[ResNet50_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNet50_Weights.verify(weights)
    r"""ResNet-50
    Args:
        weights: value can be
            ResNet50_Weights.ImageNet1K_V1
            ResNet50_Weights.ImageNet1K_V2
    """

datumbox commented 2 years ago

@divideconcept Thanks a lot for your feedback.

Your recommendations are indeed spot on. It's very reassuring for us that you identify the same limitations as this is something we want to tackle in our 2022Q1 roadmap. More specifically:

Introduce a model registration mechanism similar to the one used in the prototype dataset area. We see this as a separate RFC to this one, so I'll make sure to ping you early in the process to get your input.
Improve the model documentation. @NicolasHug is already looking into ways to programmatically build the documentation directly from the meta-data and include more information such as model sizes etc. We also want to document better the training processes used for estimating the weights.

Let's stay in touch concerning the prototype work. Given your work on TorchStudio, it would be great to coordinate and get your input early.

xiaohu2015 commented 2 years ago

I think prototype API is wonderful design. Now that you add the data transform of eval mode in the weights, is it possble to include the data transform of train mode. The data transform of train mode is more complicated (eg. mixup, sampler, smmoth label), maybe it is troublesome.

datumbox commented 2 years ago

@xiaohu2015 Thanks for the feedback!

You are right, currently we don't include the training transforms but instead provide a link to the training recipe. There was originally a lot of debate on whether we should include them and this is still an open question. For full transparency, I will list here the reasons why currently they are not included, so that the community is aware and can provide feedback:

Not all transforms used for training are currently part of TorchVision core library. Though we plan to port them on main library soon.
Some transforms, such as MixUp and CutMix, might be done on the DataLoader level via collate_fn. Thus they don't fit nicely the API.
Some transforms might require a dependency that TorchVision doesn't currently have. For example, let's say that a community member contributes a model that was trained using cv2 transforms.
The transforms alone are not enough to describe how the model was trained. As you said, sampler and other techniques affect the data that are actually fed to the model.

For now we opted in providing a reference to the training recipe, but I would love to hear your input on how we can boost reproducibility further.

yoshitomo-matsubara commented 2 years ago

@datumbox For naming conventions, it would be better to stick to Python naming convention specifically for class name, which should be CapWords per PEP 8.

I found some new class names in new API contain _ such as model weight config classes e.g., Wide_ResNet50_2_Weights where I would name it either as WideResNet50K2Weights or WideResNet50WF2Weights as 2 comes from k: widening factor from the paper

I am also wondering why it has to be Enum. It seems that all the new WeightEnum classes come with some URL, so I feel HuggingFace hub style (input: URL, output: module e.g., transform, model weights, model config, etc) may be convenient for many PyTorch users.

datumbox commented 2 years ago

@yoshitomo-matsubara Thanks for the feedback.

For naming conventions, it would be better to stick to Python naming convention specifically for class name, which should be CapWords per PEP 8.

You are right. We are not happy with the current naming situation. This is why I was eager to discuss this here on Github, summarize the challenges and reasons we adopted the specific naming convention and, if possible, find alternatives.

The issue is that TorchVision has already extremely long model builder names that contain _. Here are a few problematic examples:

resnext101_32x8d: The underscore was added to separate the numbers and without it the name is unreadable.
shufflenet_v2_x1_5: The x1_5 represents the multiplier 1.5 with underscore replacing the dot.
keypointrcnn_resnet50_fpn: The underscore separates the different model architectures of the variant.

After lengthy offline and online discussions (see https://github.com/pytorch/vision/pull/4937#discussion_r749545547, https://github.com/pytorch/vision/issues/4652#issuecomment-946647712), the convention adopted was to take the Model builder name, capitalize it properly and append _Weights to it. We favoured this approach because we felt:

Makes the existing long names more readable. Example: use FasterRCNN_ResNet50_FPN_Weights instead of FasterRCNNResNet50FPNWeights, to avoid cramping together model names that contain acronyms or packed encodings of hyper-parameters.
Makes more apparent the link between the method name and it's weights. Example: use FasterRCNN_MobileNet_V3_Large_320_FPN_Weights instead of FRCNN_MNetV3L_320, which might be shorter but less clear. Also this allows us to automatically check the naming conventions in unit-tests.
Clarifies that the Enums are Weights, not model classes. Example: use ResNet50_Weights instead of ResNet50 as the latter can be confused for an nn.Module class.

If you have any suggestions on how to improve the situation, I would really love to hear them!

I am also wondering why it has to be Enum.

First of all it is possible to pass strings to the model builder. So the following will work, though it's discouraged:

from torchvision.prototype import models as PM
model = PM.resnet50(weights="ResNet50_Weights.ImageNet1K_V2")
# or
model = PM.resnet50(weights="ImageNet1K_V2")

Why? Because now you don't have access to the weights transforms. So instead, if one wants to work with strings is advised to do:

from torchvision.prototype import models as PM
w = PM.get_weight("ResNet50_Weights.ImageNet1K_V2")
PM.resnet50(weights=w)
preprocess = w.transforms()
# ...

Enums allowed us to group together information related to the model weights in a single object and ensure its always available for us during the model construction. This allowed us to enforce schemas on the meta-data and retrieve key information (such as the backend and the non-quantized weights for the case of quantization, the num_classes for classification/detection/segmentation etc). Moreover note that TorchVision favours Enums to strings because of their better IDE integration, ability to do static analysis on the code and the reduced chance of typos.

I feel HuggingFace hub style (input: URL, output: module e.g., transform, model weights, model config, etc) may be convenient

For backwards compatibility reasons, the output of model builders must be the model, not a tuple of things. Moreover there are other limitations such as JIT-scriptability that we had to factor in. The above API choice does not stop us from offering a registration mechanism that gives a similar functionality on the future. It was actually within our plans to offer such a new API but decided to decouple it from this proposal to give more time for feedback. You can see an example of such a registration mechanism on the prototype dataset.

Let me know your thoughts.

yoshitomo-matsubara commented 2 years ago

Thank you @datumbox for the clarification. Now the adoption of enum sounds more convincing to me.

In terms of grouping information by enums,

I feel many of the long name issues are from modules using classifiers as backbones like detection and segmentation. What about using a hierarchical structure so that we could avoid redundancy and use of _ ?
it seems that the attribute names don't have to be mixture of capital letter and _ but should be lowercased e.g., FasterRCNN_ResNet50_FPN_Weights.Coco_V1 -> FasterRCNN_ResNet50_FPN_Weights.coco_v1

e.g., Grouping weights of FasterRCNN series like this

class WeightsCollection:
# some new class (whose design is similar to WeightsEnum) to stores multiple weights 
# skip the detailed implementation here

class FasterRCNNWeights(WeightsEnum):
    resnet50_fpn_backbone = WeightsCollection(
        coco_v1=Weights(
            url="https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth",
            transforms=CocoEval,
            meta={
                **_COMMON_META,
                "num_params": 41755286,
                "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-resnet-50-fpn",
                "map": 37.0,
            },
        default='coco_v1'
        )
    )

    mobilenet_v3_large_fpn_backbone = WeightsCollection(
        coco_v1=Weights(
            url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth",
            transforms=CocoEval,
            meta={
                **_COMMON_META,
                "num_params": 19386354,
                "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-fpn",
                "map": 32.8,
            },
        default='coco_v1'
        )
    )

    mobilenet_v3_large_320_fpn_backbone = WeightsCollection(
        coco_v1=Weights(
            url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_320_fpn-907ea3f9.pth",
            transforms=CocoEval,
            meta={
                **_COMMON_META,
                "num_params": 19386354,
                "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-320-fpn",
                "map": 22.8,
            },
        default='coco_v1'
        )
    )

instead of

class FasterRCNN_ResNet50_FPN_Weights(WeightsEnum):
    Coco_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 41755286,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-resnet-50-fpn",
            "map": 37.0,
        },
    )
    default = Coco_V1

class FasterRCNN_MobileNet_V3_Large_FPN_Weights(WeightsEnum):
    Coco_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 19386354,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-fpn",
            "map": 32.8,
        },
    )
    default = Coco_V1

class FasterRCNN_MobileNet_V3_Large_320_FPN_Weights(WeightsEnum):
    Coco_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_320_fpn-907ea3f9.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 19386354,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-320-fpn",
            "map": 22.8,
        },
    )
    default = Coco_V1

What do you think?

datumbox commented 2 years ago

@yoshitomo-matsubara Thanks for the proposal and for including an example to clarify what you mean.

I feel many of the long name issues are from modules using classifiers as backbones like detection and segmentation. What about using a hierarchical structure so that we could avoid redundancy and use of _ ?

Yes that is correct, it comes from this existing convention of TorchVision. Using Hierarchical structures is a viable solution with the benefit that it groups together weights that are on the same family. I'm not sure it addresses the length aspect as FasterRCNNWeights.mobilenet_v3_large_fpn_backbone.coco_v1 is lengthier than the existing option. Also note that building such a solution would require much more code to handle the hierarchy. I think that the biggest drawback is that it eliminates the 1-1 mapping between a model builder and it's enum. This relationship is used for weight validation (static analysis + run-time verification), so effectively we would be disabling features from the existing API.

it seems that the attribute names don't have to be mixture of capital letter and _ but should be lowercased

Correct, the values of the enums don't have to be capitalized like this. Here are a few current examples of such values: CocoWithVocLabels_V1, Coco_V1, ImageNet1K_V1, ImageNet1K_FBGEMM_V1, Kinetics400_V1. Originally I've capitalized them like that to make values such as CocoWithVocLabels_V1 more readable, though admittedly the same can be achieved if we rename this to something like coco_voclabels_v1. If I was to change the capitalization, I would probably go with capital letters to make it consistent with the rest of TorchVision (example 1, 2, 3).

@NicolasHug @pmeier Any thoughts on changing the capitalization of the values of Weight Enums to all capitals?

pmeier commented 2 years ago

Given that PEP8 states:

Constants are [...] written in all capital letters with underscores separating words.

and enum fields are very much constants, I would prefer to also use this naming scheme for them.

yoshitomo-matsubara commented 2 years ago

@pmeier

and enum fields are very much constants, I would prefer to also use this naming scheme for them.

I agree that it the enum fields like Coco_V1 are constants and should be all capital letters with underscores like COCO_V1

@datumbox

I'm not sure it addresses the length aspect as FasterRCNNWeights.mobilenet_v3_large_fpn_backbone.coco_v1 is lengthier than the existing option.

I think the main problem in the current naming is _ in class names and having hierarchical structure enables us to keep the class names short e.g., FasterRCNNWeights, and then lengthy parts will be field (attribute) names, thus we can use _ like FasterRCNNWeights.MOBILENET_V3_LARGE_FPN_BACKBONE.COCO_V1 following @pmeier 's suggestion :)

I think that the biggest drawback is that it eliminates the 1-1 mapping between a model builder and it's enum. This relationship is used for weight validation (static analysis + run-time verification), so effectively we would be disabling features from the existing API.

I might miss something there, but will the change look simple like below?

@handle_legacy_interface(weights=("pretrained", ResNetWeights.RESNET18.IMAGENET1K_V1))
def resnet18(*, weights: Optional[ResNetWeights.RESNET18] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNetWeights.RESNET18.verify(weights)

    return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)

@handle_legacy_interface(weights=("pretrained", ResNetWeights.RESNET34.IMAGENET1K_V1))
def resnet34(*, weights: Optional[ResNetWeights.RESNET34] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNetWeights.RESNET34.verify(weights)

    return _resnet(BasicBlock, [3, 4, 6, 3], weights, progress, **kwargs)

instead of

@handle_legacy_interface(weights=("pretrained", ResNet18_Weights.ImageNet1K_V1))
def resnet18(*, weights: Optional[ResNet18_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNet18_Weights.verify(weights)

    return _resnet(BasicBlock, [2, 2, 2, 2], weights, progress, **kwargs)

@handle_legacy_interface(weights=("pretrained", ResNet34_Weights.ImageNet1K_V1))
def resnet34(*, weights: Optional[ResNet34_Weights] = None, progress: bool = True, **kwargs: Any) -> ResNet:
    weights = ResNet34_Weights.verify(weights)

    return _resnet(BasicBlock, [3, 4, 6, 3], weights, progress, **kwargs)

datumbox commented 2 years ago

@yoshitomo-matsubara

I might miss something there, but will the change look simple like below?

The current implementation uses the class information to verify that the provided weights is of the same type. See the code here.

Frankly I'm still not convinced that FasterRCNNWeights.MOBILENET_V3_LARGE_FPN_BACKBONE.COCO_V1 is much of an improvement from the existing one. It's longer and even if you remove the extra _BACKBONE, you still have a string of same size. Additionally this increases complexity and requires extra code to handle the hierarchy, which is unnecessary because one can't used the weights of one model builder to another from the same family.

I agree that it the enum fields like Coco_V1 are constants and should be all capital letters with underscores like COCO_V1

Sounds good. Happy to review a PR that makes the capitalization change of enum values across the prototype API.

yoshitomo-matsubara commented 2 years ago

@datumbox

The current implementation uses the class information to verify that the provided weights is of the same type. See the code here.

Additionally this increases complexity and requires extra code to handle the hierarchy, which is unnecessary because one can't used the weights of one model builder to another from the same family.

Does it mean these two functions (verify and from_str) will not work with the hierarchical structure? If so, because ResNetWeights.RESNET18 in the above example is still WeightsEnum, the following one should work

weights = ResNetWeights.RESNET18.verify(weights)

https://github.com/pytorch/vision/blob/main/torchvision/prototype/models/_api.py#L50-L66

If the hierarchical structure is not something torchvision wants to offer, I came up with a new idea. Instead of having the hierarchical structure, what about introducing a weights module like torchvision.models.weights then having model_name.py such as faster_rcnn.py?

Then, torchvision/models/weights/faster_rcnn.py can have the current WeightsEnum style but without _ in the class name like

class MobileNetV3LargeFpnWeights(WeightsEnum)
    COCO_V1 = Weights(
        url="https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth",
        transforms=CocoEval,
        meta={
            **_COMMON_META,
            "num_params": 19386354,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#faster-r-cnn-mobilenetv3-large-fpn",
            "map": 32.8,
        },
    )
    default = COCO_V1

While the class name itself has no link to FasterRCNN class, since it is placed under torchvision.models.weights.faster_rcnn i.e., torchvision.models.weights.faster_rcnn.MobileNetV3LargeFpnWeights, it would be obvious that it is for FasterRCNN class.

It may also shorten a python file per model compared to having the weight enum classes and model classes + functions in one file.

Sounds good. Happy to review a PR that makes the capitalization change of enum values across the prototype API.

Regardless of whether or not my ideas above are adopted, I'm happy to make the changes and send a PR :D

P.S.

As __members__ in enum is a mappingproxy instance, I think the following part can be more efficient by replacing

    @classmethod
    def from_str(cls, value: str) -> "WeightsEnum":
        for k, v in cls.__members__.items():
            if k == value:
                return v
        raise ValueError(f"Invalid value {value} for enum {cls.__name__}.")

with

    @classmethod
    def from_str(cls, value: str) -> "WeightsEnum":
        if value in cls.__members__:
            return cls.__members__[value]
        raise ValueError(f"Invalid value {value} for enum {cls.__name__}.")

I can send a separate PR for this if it looks good to you.

datumbox commented 2 years ago

@yoshitomo-matsubara Thanks for the ideas. I think for now we are can move forwards with the capitalization of the enum values. Good spot on the optimization of the from_str, a PR to fix is also welcome. The rest of the ideas require more careful thought.

Feel free to add me as reviewer on the PR so that we can discuss any potential name changes.

datumbox commented 2 years ago

We've merged the prototype API into main TorchVision. The target was to do this as early as possible before the next release to leave enough time for tests. Please keep sharing your feedback to help us iron out the rough edges. Issues/PRs are very welcome.

hhaAndroid commented 2 years ago

Hi, This feature is very nice. But how to know training information through prototype, such as epoch, aug, etc. I think this information is directly tied to mAP.

class RetinaNet_ResNet50_FPN_Weights(WeightsEnum):
    COCO_V1 = Weights(
        url="https://download.pytorch.org/models/retinanet_resnet50_fpn_coco-eeacb38b.pth",
        transforms=ObjectDetection,
        meta={
            "task": "image_object_detection",
            "architecture": "RetinaNet",
            "publication_year": 2017,
            "num_params": 34014999,
            "categories": _COCO_CATEGORIES,
            "interpolation": InterpolationMode.BILINEAR,
            "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#retinanet",
            "map": 36.4,
        },
    )
    DEFAULT = COCO_V1

Is it possible to attach a link to indicate these key pieces of information？ Currently through https://pytorch.org/vision/stable/models.html#runtime-characteristics, I still cannot know the training strategy. Was it trained by this training strategy ? I am very confused.

divideconcept commented 2 years ago

It's right there in your code snippet: "recipe": "https://github.com/pytorch/vision/tree/main/references/detection#retinanet" The link has all the informations you need.

xiaohu2015 commented 2 years ago

you can find the training setting (traing command) in the url of recipe,

datumbox commented 2 years ago

@hhaAndroid This is very fair feedback; we must fix our documentation.

The exact training details should be already there (else it's a bug and we need to fix ASAP) but it's a bit of a mess right now because we literally just released the Multi-weight support API. I confirm that what @divideconcept and @xiaohu2015 told you is correct. The recipe URL that exists in every weight meta-data should link to the place where we have all the details. Sometimes that's a dedicated doc page and sometimes it's an issue/PR.

In Q2, as part of our documentation revamp, we plan to create cleaner model documentation. @NicolasHug is already looking into this and has created some prototypes at #5577. There is a dedicated feedback issue for our documentation #5511.

hhaAndroid commented 2 years ago

@datumbox OK. Thank you for your patient reply. I'm confused because the mAP is 36.4 when retinanet is trained for 26 epochs. According to my understanding, 12 epoch can achieve this accuracy. So where am I not understanding?

hhaAndroid commented 2 years ago

@hhaAndroid This is very fair feedback; we must fix our documentation.

The exact training details should be already there (else it's a bug and we need to fix ASAP) but it's a bit of a mess right now because we literally just released the Multi-weight support API. I confirm that what @divideconcept and @xiaohu2015 told you is correct. The recipe URL that exists in every weight meta-data should link to the place where we have all the details. Sometimes that's a dedicated doc page and sometimes it's an issue/PR.

In Q2, as part of our documentation revamp, we plan to create cleaner model documentation. @NicolasHug is already looking into this and has created some prototypes at #5577. There is a dedicated feedback issue for our documentation #5511.

Yes. Really need a document like Model Zoo , otherwise the user has to open each model to understand.

xiaohu2015 commented 2 years ago

@datumbox OK. Thank you for your patient reply. I'm confused because the mAP is 36.4 when retinanet is trained for 26 epochs. According to my understanding, 12 epoch can achieve this accuracy. So where am I not understanding?

you are right, it should be get the same mAP use 1x (12 epoch) training.

hhaAndroid commented 2 years ago

@datumbox OK. Thank you for your patient reply. I'm confused because the mAP is 36.4 when retinanet is trained for 26 epochs. According to my understanding, 12 epoch can achieve this accuracy. So where am I not understanding?

you are right, it should be get the same mAP use 1x (12 epoch) training.

So why is the 26 epoch accuracy not improved?

datumbox commented 2 years ago

Yes. Really need a document like Model Zoo, otherwise the user has to open each model to understand.

Agreed. Any thoughts on whether this should be on the documentation page for the specific model builder (for example resnet50) or whether indeed you recommend a single Model Zoo doc? I believe @NicolasHug is investigating putting it on the dedicated model builder page but I could be wrong.

So why is the 26 epoch accuracy not improved?

The retinanet model you refer to is very old and wasn't trained by any current member of our team. Unfortunately we don't have the detailed training log but we know it was trained for 26 epochs. You are right to say that possibly that was an overkill but that's how it was produced and that's why we record it as such. Having said that, our team has now put processes in place to ensure our work is fully reproducible and thorough checks are done prior merging weights.

Finally, if you have more feedback about our documentation we would love to hear it at #5511. If you have feedback concerning this new model builders, use this thread.

netw0rkf10w commented 2 years ago

Hi @datumbox,

I am updating my personal vision library to take into account the new API and I find that it is a bit inconvenient to work with, mostly because it requires users to manually specify the weights for different models.

Suppose that for some application we always use pre-trained weights, then in the old API the user only needs to provide a single argument that is the model name:

for m in ['resnet50', 'resnet101', 'resnet152']: # etc.
    model = torchvision.models.__dict__[m](pretrained=True)

With the new API, there is a second argument that depends on the first:

model = models.__dict__['resnet50'](weights=ResNet50_Weights.DEFAULT)
model = models.__dict__['resnet101'](weights=ResNet101_Weights.DEFAULT)

This is rather cumbersome (and a bit weird), because if my model is resnet50 then obviously the weights should be from ResNet50_Weights, why do I still have to manually specify it?

I would suggest the following:

model = models.__dict__['resnet50'](weights='default') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='v1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='v2') # Use ResNet50_Weights.IMAGENET1K_V2
model = models.__dict__['resnet50'](weights=ResNet50_Weights.DEFAULT) # Use ResNet50_Weights.DEFAULT

This way we would have the best of both worlds (the old and new APIs).

datumbox commented 2 years ago

@netw0rkf10w Thanks a lot for the feedback.

If I understand you correctly, you would like to be able to initialize the weights using strings instead of providing the whole Enum. Is that right? If that's what you mean, this is already supported.

Examples:

# These all work:
model = models.__dict__['resnet50'](weights='DEFAULT') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='IMAGENET1K_V1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='IMAGENET1K_V2') # Use ResNet50_Weights.IMAGENET1K_V2

# Or the shorter version
model = resnet50(weights="DEFAULT")
model = resnet50(weights="IMAGENET1K_V1")
model = resnet50(weights="IMAGENET1K_V2")

Currently the values must be all capital (match their enum values). Unfortunately we can't make them shorter (aka "v1", "v2") because our intention is to offer many more pre-trained weights on various datasets. In fact we've already added Weakly supervised weights trained on instagram data (see #5708) and we plan to increase the number of datasets on the future. Thus we've opted for adding the dataset name in the weight name.

So since this supported why it's not well-documented? Two main reasons:

Our documentation is still lacking; thankfully we are improving it quickly with the help of the community. See #5833 if you want to give us a hand
We recommend using enums because they give you access to additional information such as the preprocessing transforms and the meta-data. See this example for more info.

On the near future we also plan to build better registration mechanisms so that you don't have to do:

# no more hacks like this:
model = models.__dict__['resnet50'](weights='DEFAULT') 

# but instead something like this:
model = models.get('resnet50')(weights='DEFAULT')

We didn't roll this out yet because we want to offer a similar mechanism as in the new Datasets API which is in development.

If you have more comments/concerns or if I completely misunderstood what you meant, please let me know. Your feedback is very welcome. :)

netw0rkf10w commented 2 years ago

@datumbox Thanks a lot for the detailed reply!

Examples:

# These all work:
model = models.__dict__['resnet50'](weights='DEFAULT') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='IMAGENET1K_V1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='IMAGENET1K_V2') # Use ResNet50_Weights.IMAGENET1K_V2

# Or the shorter version
model = resnet50(weights="DEFAULT")
model = resnet50(weights="IMAGENET1K_V1")
model = resnet50(weights="IMAGENET1K_V2")

But this is exactly what I was looking for!! I didn't know that this is already supported, there's no documentation other than your blog post, which didn't mention this feature (perhaps it was not yet implemented at that time).

So since this supported why it's not well-documented? Two main reasons:

Our documentation is still lacking; thankfully we are improving it quickly with the help of the community. See #5833 if you want to give us a hand

Thanks. I'll join the discussion in #5833 later (though I feel that general/common documentation for the new weight API would be currently more needed than per-model documentation as proposed in #5833; what I mean by general documentation could be similar to a succinct summary of your blog post, but updated with all the new features).

On the near future we also plan to build better registration mechanisms so that you don't have to do:
# no more hacks like this:
model = models.__dict__['resnet50'](weights='DEFAULT') 

# but instead something like this:
model = models.get('resnet50')(weights='DEFAULT') 
We didn't roll this out yet because we want to offer a similar mechanism as in the new Datasets API which is in development.

This is indeed a nice-to-have feature. Looking forward to its release!

Cheers!

netw0rkf10w commented 2 years ago

@datumbox Is there any plan to add the weights pre-trained (with self-supervision) on ImageNet-21K for vision_transformer? (cc @sallysyw @YosuaMichael) Thanks.

datumbox commented 2 years ago

Not immediate plans but it's something we can consider for future models. Up until recently that was not an option purely because we couldn't deploy multiple weights. Now that it's possible, we can assess it. I think that merits its own issue and would require some discussion for which models we should support.

netw0rkf10w commented 2 years ago

@datumbox Thanks. Would you be interested in a PR? ;)

datumbox commented 2 years ago

Potentially, but this requires discussion to understand exactly the proposal and plan. We have a new Model Contribution Guide, so adding weights to TorchVision is definitely a possibility. The devil is in the details though so I recommend you to start a new issue and tag us there to discuss the details. :)

frgfm commented 2 years ago

Sorry everyone, I have had some time off since feedback was requested on that topic. But I'm quite happy with the proposed design. As a user & a contributor, I'd argue :

possibility to not load anything
when asking for pretrained, default params are (somehow) the best / most generalizable ones available
advanced users, can select different weights

Now to me, that leads to a few things:

what should be the extent of available weights (continuing naively, we'll have hundreds by the end of the year)
if you can select different set of params, that means they may have been trained on different datasets and distributions. So loading a specific weight set (from torchvision I mean), should also load the required information to perform a correct inference (mean, std, input size at least)

To the best of my limited knowledge, those last two challenges haven't been ignored so it looks quite exciting to me 👍

vahvero commented 2 years ago

@netw0rkf10w Thanks a lot for the feedback.

If I understand you correctly, you would like to be able to initialize the weights using strings instead of providing the whole Enum. Is that right? If that's what you mean, this is already supported.

Examples:
# These all work:
model = models.__dict__['resnet50'](weights='DEFAULT') # Use ResNet50_Weights.DEFAULT
model = models.__dict__['resnet50'](weights='IMAGENET1K_V1') # Use ResNet50_Weights.IMAGENET1K_V1
model = models.__dict__['resnet50'](weights='IMAGENET1K_V2') # Use ResNet50_Weights.IMAGENET1K_V2

# Or the shorter version
model = resnet50(weights="DEFAULT")
model = resnet50(weights="IMAGENET1K_V1")
model = resnet50(weights="IMAGENET1K_V2")
Currently the values must be all capital (match their enum values). Unfortunately we can't make them shorter (aka "v1", "v2") because our intention is to offer many more pre-trained weights on various datasets. In fact we've already added Weakly supervised weights trained on instagram data (see #5708) and we plan to increase the number of datasets on the future. Thus we've opted for adding the dataset name in the weight name.

So since this supported why it's not well-documented? Two main reasons:

Our documentation is still lacking; thankfully we are improving it quickly with the help of the community. See Revamping our classification models docs #5833 if you want to give us a hand

We recommend using enums because they give you access to additional information such as the preprocessing transforms and the meta-data. See this example for more info.

On the near future we also plan to build better registration mechanisms so that you don't have to do:
# no more hacks like this:
model = models.__dict__['resnet50'](weights='DEFAULT') 

# but instead something like this:
model = models.get('resnet50')(weights='DEFAULT') 
We didn't roll this out yet because we want to offer a similar mechanism as in the new Datasets API which is in development.

If you have more comments/concerns or if I completely misunderstood what you meant, please let me know. Your feedback is very welcome. :)

This might be a minor gripe, but I would personally see

model = resnet50(weights=resnet50.weigths.DEFAULT)
# Of course also
model = resnet50(weigths=resnet50.weigths["DEFAULT"])

as the most pleasant implementation for these enums. User would not require two different imports to create a model (the model creation and weights).

Currently I am using

backbone = "resnet50"
anchor_sizes=(
    (64,),
    (128,),
    (256,),
    (512,),
    (812,),
),

aspect_ratios = ((0.5, 0.75, 1.0, 1.33, 2.0),) * len(anchor_sizes)
anchor_generator = AnchorGenerator(anchor_sizes, aspect_ratios)

# Here new weight system will feels cumbersome and causes a lot of imports
if backbone="resnet50":
    weights = ResNet50_Weights.DEFAULT
elif backbone="resnetAny":
    # ....
    pass

backbone = resnet_fpn_backbone(
    backbone_name=backbone,
    weights=weights,
)

model = FasterRCNN(
    backbone,
    num_classes=num_classes,
    rpn_anchor_generator=anchor_generator,
)

Of course, after reading discussion I realized that weights="DEFAULT" may be valid for these functions. Trying later today. Still, using class properties could be considered.

Edit: The weights="DEFAULT" does work for the fpn. Most likely unsurprisingly for most.

datumbox commented 2 years ago

@vahvero Thanks for the feedback.

I agree that the idiom resnet50.weigths["DEFAULT"] could have been a viable option. The reason we decided to go with strings was that the community preferred them.

Currently strings are a convenient shortcut which allow you to create a model with a single import. What I find a bit worrying is the fact that it took you a while to find out about the fact that they are supported. We mention it in the docs, but I wonder if it's not highlighted correctly. Thoughts?

divideconcept commented 2 years ago

I second that, I wasn't aware that I could use shortcut strings at all. The reason is that I didn't read the intro about pre-trained weights and went straight to the models I was interested in (ie: ResNet50). The weights parameter there is purely described as of type ResNet50_Weights so it didn't occur to me it could be of type str at all.

I think the intro about how to use weights is fine and should stay where it is, but it could give a hint to users to define weights as being weights: Optional[Union[ResNet50_Weights, str]] = None (correct if I'm wrong with the python syntax, but I think Union is how you describe 2 possible types).

Unless you think this complexifies the reading of the parameter too much !

NicolasHug commented 2 years ago

@datumbox we can easily add a small sentence here which would say. "Weights can be specified as strings as well e.g. 'DEFAULT' or '\<some relevant weight>'".

https://github.com/pytorch/vision/blob/91176e80cf611ce7d60e52bc94e81adf51aa2931/docs/source/conf.py#L327-L330

I'll send a PR tomorrow

vahvero commented 2 years ago

@vahvero Thanks for the feedback.

I agree that the idiom resnet50.weigths["DEFAULT"] could have been a viable option. The reason we decided to go with strings was that the community preferred them.

Currently strings are a convenient shortcut which allow you to create a model with a single import. What I find a bit worrying is the fact that it took you a while to find out about the fact that they are supported. We mention it in the docs, but I wonder if it's not highlighted correctly. Thoughts?

I routinely just read the source code, because most of the keyword arguments are left undocumented in torch docs. Our use cases always require some changes and I have found it easier to find the relevant information this way, in the case above, the FPN changes with different anchor sizes. I would not suggest basing the goodness of the documentation to my experience.

That said, these keywords have been documented and typed in the source code as Optional[ResNet50_Weights] which is incorrect. As @divineconcept pointed out, they accept str as well.

I would like to add to @divineconcepts answer that from 3.10 onward Union is possible to write as |.

EDIT: If being really strict, they accept Litelar["DEFAULT", "IMAGENET1K_V1", ...etc. ], not a generic str.

datumbox commented 2 years ago

@vahvero @divideconcept Thanks for the input.

The honest truth is that originally I was not too eager to offer this shortcut but I was compelled by the community feedback to make it available. Originally one key limitation of using strings was that you couldn't get the meta-data and preprocessing info associated with them. We later fixed this by providing the get_weight() method.

I think adding the string on the types makes sense. I spoke with @NicolasHug and we plan to provide a registration mechanism as well this half. We can make the change on the types while we add the registration. Thanks a lot for helping us make a more streamlined API.

Lodour commented 1 year ago

Hi @datumbox, I want to kindly bring your attention to an inconvenient use case in the adversarial ML community.

A common use case there (as far as I know) is to compute the gradient wrt the input with two expectations:

The input has the model's desired input shape, e.g., 224*224.
The input is normalized to [0, 1].

For example, with the old API, I can do the following:

# Get data sample of the desired shape
dataset = ImageNet(..., transform=resize_and_center_crop_and_to_tensor)
x, y = dataset[0]

# Get model including the normalization layer
model = nn.Sequential(
    partial(F.normalize, mean=mean, std=std),  # just a demo, should be a nn.Module working like this
    resnet50(pretrained=True),
)

# Backward
x_t = torch.autogradVariable(x.detach().clone(), requires_grad=True)
loss(model(x_t), y).backward()

# Do something to x
x_adv = x + x_t.grad

In the example above, I need two separate transforms: resize and normalize.

However, the new API does not expose these details; it expects the user to resize & normalize the inputs before feeding them to the model. Hence, I have to do some hacks on the private API:

# Get full transforms from weights Enum
preprocess = weights.transforms()

# Extract the resize part -- normalize with (0, 1) so it does not have any effects
resize = ImageClassification(
    crop_size=preprocess.crop_size[0],
    resize_size=preprocess.resize_size[0],
    mean=(0, 0, 0),
    std=(1, 1, 1),
    interpolation=preprocess.interpolation,
)

# Get model including the normalization layer
model = nn.Sequential(
    preprocess,  # we will send in the same size, so it will only normalize the input
    resnet50(weights=weights),
)

# Prepare the data
x = resize(x)

# Do something similar to the previous example
x_t = torch.autogradVariable(x.detach().clone(), requires_grad=True)
loss(model(x_t), y).backward()
x_adv = x + x_t.grad

I hope I have been clear about my point. There might be simpler examples, but this is the best that I could have thought about. I believe there are other cases where people prefer to work with pre-resized images before normalization.

Back to the example above, I guess what I am expecting is that ImageClassification could somehow separate the resize and normalize transforms, so that I don't have to hack the private API to create another ImageClassification. See the example below.

# Get full transforms from weights Enum
preprocess = weights.transforms()

# Extract the transforms up to before the normalization
prepare = preprocess.input_preparation

# Get model including the normalization layer
model = nn.Sequential(
    preprocess.input_normalization,
    resnet50(weights=weights),
)

# Prepare the data
x = prepare(x)

# Work on the model that expects inputs of the same shape ranged within [0, 1]
model(x)

It would be great to hear your thoughts, I am not sure if this use case can be made easy from torchvision's side or my side.

Thanks!

datumbox commented 1 year ago

@Lodour thanks a lot for the feedback, I appreciate the time you took to report it. This is an interesting use-case. I would like to explore with you if the new API is the right solution for you or if we need a different solution.

The new API aims to address the followings:

Provide an easy way to communicate which inference transforms where used with each set of weights.
The transforms are meant to be frozen in time and not to be changed. This aligns with the fact that the weights are also versioned, frozen and don't change.
Users are expected to invoke the .transforms() method to get an instantiated version of them. The decision to do lazy initialization (as opposed to have them pre-initiailized and accessible via a transforms attribute) is to save memory for cases where the preprocessing transforms require memory. This is less common in CV but in NLP the preprocessing might involve nn.Modules that require a lot of memory (for example an embedding space for the Vocabulary).
The API is intentionally kept private because its parameters or class might change on the future. What we guarantee to the users is the fact that if they call .transforms() they will always get the right transforms.

The above approach is probably too restrictive for power users that:

Want to fine-tune the weights on a different configuration (size, interpolation etc)
Plan to use the pre-trained backbone on a downstream CV task that requires a different preprocessing. In this case, the user should define their own preprocessing and train a new model (on top of the pre-trained backbone). A good example for this is Detection models that usually operate on larger resolutions. For these pre-trained models, we provide separate transforms bundled with their weights.
Want to have full control over the preprocessing steps. Those users are discouraged from using the private classes such as ImageClassification because we will break your code on the future if you do.

Now obviously your use-case is very valid and not properly supported by the API. What's unclear to me still is whether this API is the right option for you. I suspect that given you want to construct a significantly different protocol than what was used for Classification, you are better off to check the configuration of the bundled transforms and construct your own. This will give you the freedom to modify it to your problem without having to touch private code. That's obviously one approach. There are others including for us to decide to make those classes public and fully configurable. I just don't know if it is possible to produce a generic enough class that can address all use-cases while maintaining a clean and simple interface. This is a direction we considered initially but we decided that users who want full control should define their own preprocessing.

I would love to hear your thoughts on this. Looking forward to your reply.

Lodour commented 1 year ago

@datumbox Thanks for your response!

I agree with you that hacking into this private API is not the right option. Although I really like the new API because of the benefits you have listed; previously I had to figure out those numbers from the docs and hardcode them in every project, especially the mean and std for normalization.

Then I guess this leads to where I am itching for -- I want to implement my own pipeline but also get some benefits from the new API. For example, I would prefer not to hardcode some parameters about the pre-trained model (e.g., mean and std) if they are already included in the new API (right now in the private API).

Based on your restriction of the versioned transforms & private class, it occurs to me that the hyper-parameters for preprocessing might be suitable for the meta field in Weights, which is naturally versioned:

class ResNet50_Weights(WeightsEnum):
    IMAGENET1K_V1 = Weights(
        ...,
        meta={
            ...,
            "transforms": {
                "resize_size": 256,
                "crop_size": 224,
                "mean": ...,
                "std": ...,
            }
        }
    IMAGENET1K_V2 = Weights(
        ...,
        meta={
            ...,
            "transforms": {
                # different parameters for V2
            }
        }

I think having these parameters in the metadata could benefit other users with special use cases. Here it seems OK to rely on weights.meta["transforms"] because the format is also versioned along with the weights. It might also benefit the documentation, where these inference-time details are frozen in text, so people won't need to look for those numbers in the private API.

But this might duplicate the hardcoded parameters in metadata and the current invocation of ImageClassification. I haven't come up with a good way to avoid this duplication, ~but once you can, maybe it is even possible to populate the entire weights.transform from the metadata? Yet I am not sure if it will be too aggressive at the current state.~

Looking forward to your thoughts on this option!