Open Delaunay opened 4 years ago
I would be interested in this feature too.
Hi,
I think adding a new argument to the constructor might be a good way of handling this. But given that this would be a breaking change, we will be doing it after the next release of torchvision, which will happen in January.
This would be a great feature. Looking forward to it :)
I need this feature
From offline discussions with @fmassa, we need the 3 following features:
I think the following design might work:
return_tuples
param {True, False (default)}
None
option that would be the default and that would warn users and tell them to explicitly set this parameter. But if we need to always support the legacy version anyway (by legacy I mean the current behaviour), this might be unnecessary.compute_losses
param {'always', 'training_only' (default)}
return_tuples
must be Truecompute_detections
param {'always', 'eval_only' (default)}
return_tuples
must be TrueIt's 3 new parameters, but they're all backward-compatible so users only need to set them if they need/want to. And one advantage is that it's very explicit in what it does.
We could potentially avoid introducing the compute_losses
parameter and basically return the losses iff targets is not None
, but I fear this might be too implicit, and since we need to introduce compute_detections
anyway, it's probably cleaner to do the same for losses.
Thoughts @fmassa ?
From a discussion with @fmassa : there is hope that torchscript will eventually support returning union types. This would allow the scripted mode to return either the losses or the detections, instead of always returning a tuple, and would make eager and scripted versions compatible with a minimum amount of backward incompatibility. It may be preferable to wait until then, as the rest of the issues will be easier to address as well.
I think Union types are now supported see here].
Also wouldn't it be simpler to just adjust the relevant forward
methods to return losses if targets
is provided. Then we could avoid discussions of inference_mode
and just compute the losses if the targets are provided and return the detections always.
Any updates on this? I looked around the code (the forward function here) and couldn't find what to do if I want to get losses and detections both during training and evaluation. Even getting the losses only (both during train and evaluation) should be enough.
Any help would be appreciated
+1 for this issue. IMHO the most user-friendly solution (although I don't exactly know what would be the implications) would be to always return detections and losses if the user provides (images, targets)
and to only return detections if the user provides (images)
, disregarding whether the current mode is training
or eval
.
What's the update on this issue?
I've just started trying to implement @NicolasHug's design specced out in https://github.com/pytorch/vision/issues/1574#issuecomment-779360258.
https://github.com/pytorch/vision/compare/main...bencevans:vision:detection-output+loss
As @frgfm mentions in https://github.com/pytorch/vision/issues/1775 (duplicate issue), RoiHeads
and RegionProposalNetwork
will require changes, too, to force the loss to be returned. Is force_loss: bool = False {default}
appropriate?
I'm pretty unfamiliar with the codebase (only as a user) so any pointers or collaboration would be awesome! I'd love to get this feature in as have been hit by it a few times now and end up hacking changes in mentioned from a related StackOverflow thread.
@bencevans I just also came form this StackOverflow issue and I regocnized that with the suggested change:
if self.training: proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets) else: labels = None regression_targets = None matched_idxs = None
to
if targets is not None: proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets) else: labels = None regression_targets = None matched_idxs = None
my mAP metric is much higher if I call model(imgs, annotations) instead of model(imgs) with modal.eval() on my validation set.
I think it's because proposals are used for the actual feature calc and not only used for the loss:
box_features = self.box_roi_pool(features, proposals, image_shapes) box_features = self.box_head(box_features) class_logits, box_regression = self.box_predictor(box_features)
So the method self.select_training_samples(proposals, targets) adjust somehow the proposals. Can someone explain me what this method does and why it is ok to call in training but not in eval?
Ok I came up with this idea for roi_heads.py:
if targets is not None:
if self.training:
proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
else:
proposalsV, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
box_featuresV = self.box_roi_pool(features, proposalsV, image_shapes)
box_featuresV = self.box_head(box_featuresV)
class_logitsV, box_regressionV = self.box_predictor(box_featuresV)
else:
labels = None
regression_targets = None
matched_idxs = None
box_features = self.box_roi_pool(features, proposals, image_shapes)
box_features = self.box_head(box_features)
class_logits, box_regression = self.box_predictor(box_features)
result: List[Dict[str, torch.Tensor]] = []
losses = {}
if labels is not None and regression_targets is not None:
if self.training:
loss_classifier, loss_box_reg = fastrcnn_loss(class_logits, box_regression, labels, regression_targets)
else:
loss_classifier, loss_box_reg = fastrcnn_loss(class_logitsV, box_regressionV, labels, regression_targets)
losses = {"loss_classifier": loss_classifier, "loss_box_reg": loss_box_reg}
if self.training:
if labels is None:
raise ValueError("labels cannot be None")
if regression_targets is None:
raise ValueError("regression_targets cannot be None")
else:
boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
num_images = len(boxes)
for i in range(num_images):
result.append(
{
"boxes": boxes[i],
"labels": labels[i],
"scores": scores[i],
}
)
So if you are in training mode nothing changes. If you are in eval mode and no target is provided nothing changes. But if you are in eval mode and a target is passed, then I do the Pooling and stuff twice. Once for the loss with the data from select_training_samples and once with the original proposals to get the correct mAP so that it is consistent with the non-target call. Is this a valid approach?
@oliverdain reported in https://github.com/pytorch/vision/issues/7063 that there is a similar need for KeyPoint models
I've been working this problem too and wondering if it's even necessary that we call,
proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
when in self.eval()
mode. If so, which parts of select_training_samples
are required for self.eval()
mode?
For example, would adding gt_boxes to the proposals bias the val losses in some unexpected way?
proposals = self.add_gt_proposals(proposals, gt_boxes)
What about sampling a fixed proportion of positive-negative proposals.
sampled_inds = self.subsample(labels)
This cuts out a significant number of proposals which we may not want to do in eval mode. It reminds me of Dropout. Can anyone explain (or provide links to literature on) why these 2 methods are necessary?
We need to get the model cost in eval mode when we compute validation loss/cost for hyper parameter search.
At the moment eval mode only returns the detection without the loss/cost. Which mean it is used as inference mode which is different than what people might want.
We can not use training mode to compute validation because of batchnorm and dropout.
The solution with the least code modification would be to add a new argument
inference_mode
in forward that would be true if we only need the detection and false if we want the loss/cost. and changeif self.training
toif not inference_mode
.Few classes through out the
torchvision.models.vision
would need to be updated like so