Open Erotemic opened 4 years ago
Many thanks to your feedbacks and suggestions! I totally agree with you on this point. Using positional arguments and returning tuples has been a historical issue when we are extending mmdet to more methods and tasks. Actually we had a discussion internally a few months ago, and planed to do a large refactoring by the end of this year (not started yet due to the heavy workload of current maintainers). Here are some things to do:
InstanceAnnotation
or even SampleAnnotation
class to represent the annotations (bboxes, labels, masks, keypoints, etc.) of an instance thus we can make the API as simple as forward(x, img_metas, annotations)
.The last two item are what block us from starting the refactoring since we would like to make it general enough. Glad that you are willing to work on it, which is definitely a large improvement. We can have further discussions and make the new API real in the next one or two releases.
I'm glad the core team is on board with this idea. I'm excited to see what the new API will look like.
As a reference / inspiration for the general InstanceAnnotation
or SampleAnnotation
I have a class I currently make heavy use of called kwimage.Detections
: which lives here.
The idea is that it is a lightweight wrapper around a data dictionary (that stores boxes, scores, class indexes, etc.) for a set of detections as well as a meta dictionary (that store the class index to name mapping). It supports axis-aligned bounding boxes, keypoints, polygon and make based segmentations and can be coerced to/from the coco json format. This likely wouldn't be exactly what you use, but perhaps it can serve as an inspiration and offer some lessons learned. One thing that I would recommend is to ensure whatever representation you come up efficiently stores sets of detections for each image --- i.e. the underlying boxes / scores are stored as contiguous torch.Tensor objects.
On the last point, I think it would be nice if users had the option of passing in simple dictionaries (they are easier and more intuitive to construct than custom objects), and then those were coerced to InstanceAnnotation
objects. Imagine something like this:
class Annotations(object):
"""
A class to store multiple annotations / detections in an image
"""
def __init__(self, bboxes=None, scores=None, labels=None, segmentations=None, keypoints=None, weights=None, ...):
self._data = {
'bboxes': bboxes,
'scores': scores,
'labels': labels,
'weights': weights,
'segmentations': segmentations,
'keypoints': keypoints,
}
@classmethod
def coerce(cls, data):
if isinstance(data, cls):
return data
elif isinstance(data, dict):
return cls(**data)
else:
raise TypeError(type(data))
# Custom methods and accessors
...
class CustomHead:
def forward(x, img_metas, annotations):
# If the user passes in a dict, transform it to an annotations object
# (This also validates that the correct keywords are used)
annotations = InstanceAnnotations.coerce(annotations)
Then as a user I could simply pass in
annotations = {
'bboxes': torch.rand(100, 4),
}
rather than being forced to build a custom annotation object.
Also involve @ZwwWayne in this thread.
This is less of a feature request, and more along the lines of: observations of issues I've had when working with the mmdet API and suggestions for how some of these might be refactored to improve the overall package.
One of the biggest challenges of with working with mmdet so far has been its widespread use of positional arguments. This comes in two flavors: function signatures and return values.
The current structure
As an example consider
forward_train
function inbase_dense_head.py
and its use of theget_bboxes
function:The signature for
get_boxes
looks like:And the head forward function looks somewhat like this:
The
forward_train
function currently looks something like this:Imagine if you want to extend
self.forward
to return anything other than a tupleTuple[cls_scores, bbox_preds]
. You have to create a customget_boxes
function that has arguments in an order that agree with the disparate forward function. Perhaps for some people thinking in terms of ordered items is easy, but for me, this is incredibly hard to manage. I would like to suggest an alternative.The alternative proposal
Imagine if instead of returning a tuple the
forward
function returned a dictionary where the keys were standardized instead of the positions of the values.Now, the
get_bboxes
function doesn't need to care about what particular head was used. It can simply accept theoutput
dictionary and assert that it contains particular keys that that variant ofget_bboxes
needs. (Note this might allow the head to produce other auxiliary information used in the loss, but not in the construction of boxes)We can extend this pattern further, so in addition to the
forward
function producing a dictionary, theforward_train
function will produce a dictionary as well.This has less conditionals and a consistent return type.
This means that function that use forward train can seamlessly switch between setting
proposal_cfg
and getting the boxes or just getting the loss because the return value have consistent types and access patterns in both modes. If you do need a conditional it can be based on the return value instead of having to remember the inputs.We could go even further and abstract the labels into a argument called
truth
that should contain keys:gt_bboxes
, and optionallygt_labels
andgt_bboxes_ignore
, and perhaps that might look like:Discussion
IMO this pattern produces much more readable and extensible code. We can:
Return arbitrary outputs from our forward function
Add arbitrary target information to the truth dictionary and conditionally handle it in our custom loss.
Use simpler calling patterns that explicitly extract information from returned containers based on standard (easy for humans to remember) string keywords rather than standard (hard for humans to remember) integer positions.
Use semantically meaningful labels to allow for easier introspection of our code at runtime.
I think having a standard set of keywords is much more extensible than sticking to positional based arguments.
There is a small issue of speed. Unpacking dictionaries is slower than unpacking tuples, but I don't think it will be noticeable difference given that every python attribute lookup is a dictionary lookup anyway.
This is a rather large API change, but I think the reliance of positional based arguments is stifling further development of new and exciting networks. I think there might be a way to gradually introduce these changes such that it maintains a decent level of backwards compatibility as well, but I'm not 100% sure on this.
I've played around with variants of this and it works relatively well, the main issue I had was the widespread use of
multi_apply
, which could likely be changed to assume the containers returned by forward functions are dictionaries instead of tuples.Summary
In summary I want to propose replacing positional based APIs with keyword based APIs. The main purpose of making this issue is for me to gauge the interest of the core devs. If there is interest I may work on developing this idea further and look into implementations that are amenable to a smooth transition such that backwards compatibility is not broken.