pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.11k stars 6.94k forks source link

Add an option for Dection Models to return cost in eval mode as well (validation, test loss) #1574

Open Delaunay opened 4 years ago

Delaunay commented 4 years ago

We need to get the model cost in eval mode when we compute validation loss/cost for hyper parameter search.

At the moment eval mode only returns the detection without the loss/cost. Which mean it is used as inference mode which is different than what people might want.

We can not use training mode to compute validation because of batchnorm and dropout.

The solution with the least code modification would be to add a new argument inference_mode in forward that would be true if we only need the detection and false if we want the loss/cost. and change if self.training to if not inference_mode.

Few classes through out the torchvision.models.vision would need to be updated like so

elangovana commented 4 years ago

I would be interested in this feature too.

fmassa commented 4 years ago

Hi,

I think adding a new argument to the constructor might be a good way of handling this. But given that this would be a breaking change, we will be doing it after the next release of torchvision, which will happen in January.

ryzalk commented 4 years ago

This would be a great feature. Looking forward to it :)

cyyever commented 3 years ago

I need this feature

NicolasHug commented 3 years ago

From offline discussions with @fmassa, we need the 3 following features:


I think the following design might work:

It's 3 new parameters, but they're all backward-compatible so users only need to set them if they need/want to. And one advantage is that it's very explicit in what it does.

We could potentially avoid introducing the compute_losses parameter and basically return the losses iff targets is not None, but I fear this might be too implicit, and since we need to introduce compute_detections anyway, it's probably cleaner to do the same for losses.

Thoughts @fmassa ?

NicolasHug commented 3 years ago

From a discussion with @fmassa : there is hope that torchscript will eventually support returning union types. This would allow the scripted mode to return either the losses or the detections, instead of always returning a tuple, and would make eager and scripted versions compatible with a minimum amount of backward incompatibility. It may be preferable to wait until then, as the rest of the issues will be easier to address as well.

bfialkoff commented 2 years ago

I think Union types are now supported see here].

Also wouldn't it be simpler to just adjust the relevant forward methods to return losses if targets is provided. Then we could avoid discussions of inference_mode and just compute the losses if the targets are provided and return the detections always.

khalidcawl commented 2 years ago

Any updates on this? I looked around the code (the forward function here) and couldn't find what to do if I want to get losses and detections both during training and evaluation. Even getting the losses only (both during train and evaluation) should be enough.

Any help would be appreciated

claushofmann commented 2 years ago

+1 for this issue. IMHO the most user-friendly solution (although I don't exactly know what would be the implications) would be to always return detections and losses if the user provides (images, targets) and to only return detections if the user provides (images), disregarding whether the current mode is training or eval.

deepwilson commented 1 year ago

What's the update on this issue?

bencevans commented 1 year ago

I've just started trying to implement @NicolasHug's design specced out in https://github.com/pytorch/vision/issues/1574#issuecomment-779360258.

https://github.com/pytorch/vision/compare/main...bencevans:vision:detection-output+loss

As @frgfm mentions in https://github.com/pytorch/vision/issues/1775 (duplicate issue), RoiHeads and RegionProposalNetwork will require changes, too, to force the loss to be returned. Is force_loss: bool = False {default} appropriate?

I'm pretty unfamiliar with the codebase (only as a user) so any pointers or collaboration would be awesome! I'd love to get this feature in as have been hit by it a few times now and end up hacking changes in mentioned from a related StackOverflow thread.

Perry45 commented 1 year ago

@bencevans I just also came form this StackOverflow issue and I regocnized that with the suggested change:

if self.training: proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets) else: labels = None regression_targets = None matched_idxs = None

to if targets is not None: proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets) else: labels = None regression_targets = None matched_idxs = None

my mAP metric is much higher if I call model(imgs, annotations) instead of model(imgs) with modal.eval() on my validation set.

I think it's because proposals are used for the actual feature calc and not only used for the loss: box_features = self.box_roi_pool(features, proposals, image_shapes) box_features = self.box_head(box_features) class_logits, box_regression = self.box_predictor(box_features)

So the method self.select_training_samples(proposals, targets) adjust somehow the proposals. Can someone explain me what this method does and why it is ok to call in training but not in eval?

Perry45 commented 1 year ago

Ok I came up with this idea for roi_heads.py:

    if targets is not None:
        if self.training:
            proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
        else:
            proposalsV, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
            box_featuresV = self.box_roi_pool(features, proposalsV, image_shapes)
            box_featuresV = self.box_head(box_featuresV)
            class_logitsV, box_regressionV = self.box_predictor(box_featuresV)
    else:
        labels = None
        regression_targets = None
        matched_idxs = None

    box_features = self.box_roi_pool(features, proposals, image_shapes)
    box_features = self.box_head(box_features)
    class_logits, box_regression = self.box_predictor(box_features)

    result: List[Dict[str, torch.Tensor]] = []
    losses = {}

    if labels is not None and regression_targets is not None:
        if self.training:
            loss_classifier, loss_box_reg = fastrcnn_loss(class_logits, box_regression, labels, regression_targets)
        else:
            loss_classifier, loss_box_reg = fastrcnn_loss(class_logitsV, box_regressionV, labels, regression_targets)
        losses = {"loss_classifier": loss_classifier, "loss_box_reg": loss_box_reg}

    if self.training:
        if labels is None:
            raise ValueError("labels cannot be None")
        if regression_targets is None:
            raise ValueError("regression_targets cannot be None")
    else:
        boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
        num_images = len(boxes)
        for i in range(num_images):
            result.append(
                {
                    "boxes": boxes[i],
                    "labels": labels[i],
                    "scores": scores[i],
                }
            )

So if you are in training mode nothing changes. If you are in eval mode and no target is provided nothing changes. But if you are in eval mode and a target is passed, then I do the Pooling and stuff twice. Once for the loss with the data from select_training_samples and once with the original proposals to get the correct mAP so that it is consistent with the non-target call. Is this a valid approach?

NicolasHug commented 1 year ago

@oliverdain reported in https://github.com/pytorch/vision/issues/7063 that there is a similar need for KeyPoint models

WillieMaddox commented 1 year ago

I've been working this problem too and wondering if it's even necessary that we call,

proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)

when in self.eval() mode. If so, which parts of select_training_samples are required for self.eval() mode?

For example, would adding gt_boxes to the proposals bias the val losses in some unexpected way?

proposals = self.add_gt_proposals(proposals, gt_boxes)

What about sampling a fixed proportion of positive-negative proposals.

sampled_inds = self.subsample(labels)

This cuts out a significant number of proposals which we may not want to do in eval mode. It reminds me of Dropout. Can anyone explain (or provide links to literature on) why these 2 methods are necessary?