Investigate what is needed for allowing to train object detection models

tordks commented 2 years ago

misc material:

kaggle notebook that uses PL for object detection: https://www.kaggle.com/code/artgor/object-detection-with-pytorch-lightning/notebook and https://github.com/Erlemar/wheat
kaggle notebook: https://www.kaggle.com/code/pestipeti/pytorch-starter-fasterrcnn-train/notebook
YOLOv5: https://github.com/ultralytics/yolov5
medium blog on efficientdet: https://www.pytorchlightning.ai/blog/efficientdet-meets-pytorch-lightning
4 year old blog on object detection: https://cv-tricks.com/object-detection/faster-r-cnn-yolo-ssd/
brief history on OD (2019): https://fr.slideshare.net/pfi/a-brief-history-of-object-detection-tommi-kerola
multiscale object detection: https://d2l.ai/chapter_computer-vision/multiscale-object-detection.html
yolo v4 blogpost: https://medium.com/aiguys/yolo-v4-explained-in-full-detail-5200b77aa825
object detection torch tutorial: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
torchvision has detection utils: https://github.com/pytorch/vision/tree/v0.3.0/references/detection
efficientdet pytorch: https://github.com/rwightman/efficientdet-pytorch
pytorch image models: https://github.com/rwightman/pytorch-image-models

papers:

efficientdet: https://arxiv.org/pdf/1911.09070.pdf
yolov1, v3, etc: https://pjreddie.com/publications/
yolo v4: https://arxiv.org/abs/2004.10934
DETR: https://arxiv.org/abs/2005.12872
CenterNet:
EfficientNet: https://arxiv.org/pdf/1905.11946.pdf

timeline of object detection models

datasets

wheat dataset: https://www.kaggle.com/competitions/global-wheat-detection/data?select=train
COCO: https://cocodataset.org/#home
DOTA: https://captain-whu.github.io/DOTA/dataset.html
OpenImages: https://storage.googleapis.com/openimages/web/factsfigures.html

tordks commented 2 years ago

It seems that the following tasks need to be performed to be able to support one-step object detection:

Implement a test dataset, coco or wheat.
Allow for evaluating and calculating loss on a different label than the "label" key. Need to both give in "bbox" and "cls" to loss. See eg. https://github.com/rwightman/efficientdet-pytorch/blob/master/effdet/loss.py#L145 for an example of a loss function. Might encode bbox and cls within the label key as dicts or tuples, but then it puts a restriction on the interface the loss function can have.
Visualization
- plot of image + bounding box and class
Allow other keys than "prediction" and "labels" go into the metrics. The object detection models return cls and bbox and not a single label. Need to allow a differing amount of labels per image. As above, the simplest might be to encode cls and bbox within the label key. eg.

label = {"cls": [<cls-1>, <cls-2>, ...], "bbox": [<bbox-1>, <bbox-2>, ...]}

At this point it might be beneficial to create a general wrapper to assign targets to their correct places that can be used for loss functions, transforms, visualization etc. etc.

Decide how to separate image classification and object detection within the repo.
Handle a varying amount of outputs per ground truth. There will be a different amount of objects detected in each image.

tordks commented 2 years ago

Consider whether object detection within lightning-flash can be used within this framework

Consider whether we can reuse models from icevision

tordks commented 2 years ago

Detectron2: https://towardsdatascience.com/object-detection-in-6-steps-using-detectron2-705b92575578

tordks / image_classification