Open tordks opened 2 years ago
It seems that the following tasks need to be performed to be able to support one-step object detection:
Implement a test dataset, coco or wheat.
Allow for evaluating and calculating loss on a different label than the "label" key. Need to both give in "bbox" and "cls" to loss. See eg. https://github.com/rwightman/efficientdet-pytorch/blob/master/effdet/loss.py#L145 for an example of a loss function. Might encode bbox and cls within the label key as dicts or tuples, but then it puts a restriction on the interface the loss function can have.
Visualization
Allow other keys than "prediction" and "labels" go into the metrics. The object detection models return cls and bbox and not a single label. Need to allow a differing amount of labels per image. As above, the simplest might be to encode cls and bbox within the label key. eg.
label = {"cls": [<cls-1>, <cls-2>, ...], "bbox": [<bbox-1>, <bbox-2>, ...]}
At this point it might be beneficial to create a general wrapper to assign targets to their correct places that can be used for loss functions, transforms, visualization etc. etc.
Decide how to separate image classification and object detection within the repo.
Handle a varying amount of outputs per ground truth. There will be a different amount of objects detected in each image.
Consider whether object detection within lightning-flash can be used within this framework
Consider whether we can reuse models from icevision
misc material:
papers:
timeline of object detection models
datasets