Integration with Mantisshrimp

lgvaz commented 4 years ago

Hello Ross!

First of all, thank you for all the amazing work put into this repo, your efforts into making sure your implementation could replicate the original results from the paper makes your code stand out.

The team I'm a part of is developing an object detection library called mantisshrimp and we're looking forward to add efficientdet to our arsenal.

Before I go to questions, let me give you a very brief background:

Differently than other object detection libraries, our main goal is not to implement everything ourselves, but instead to provide a framework that makes it really easy to integrate implementations made by the community.

As an example, the library does not contain any implementation of a training loop! Instead, we provide adapters to libraries like fastai and lightning that handle the training loop, if you're curious to learn more, take a look at our introduction guide.

The same can be said for models, we currently only have support for torchvision's rcnns, and we choose this implementation of efficientdet to add next 🥳

Now that you know a little bit about the background, let me get to the questions (sorry for the long list):

What is the recommended way of installing the library?
I'm currently doing pip install git+https://github.com/rwightman/efficientdet-pytorch.git but there is no mention on the README, is this the recommended way?
What class is used as background? 0 or -1? I found this comment saying background should be -1, but I wanted to confirm.
What is the order of bounding box coordinates? Again, I'm very sure you're using (xmin, ymin, xmax, ymax), but better safe than sorry 😅
How did you fixed the pycocotools CocoEvaluator with transforms problem? This is a problem we're facing as well, pycocotools is the most annoying thing ever and we're even thinking of reimplementing this entire metric and stop depending on it. Just to be sure that we're facing the same problem: The problem is that CocoEvaluator requires you to pass all targets when you first instantiate it, any transforms applied after that will be disregarded and the computed metric will be incorrect. I think my next question is related to how you solved this.
What is img_scale? How do I use it correctly? If this is related to pycocotools, is there a way of disabling it? Because we'll do evaluation outside this library.
General advice Any general advice? Any important detail I should pay extra attention? I'm currently following the Kaggle notebook mentioned in the readme as a guide, although some of the stuff is outdated it's still very helpful.

lgvaz commented 4 years ago

What happens if I feed samples with empty annotations while training? Should I take care that does not happen or the model already handles it?

rwightman commented 4 years ago

@lgvaz I don't have time for an in depth response right now, but a few quick things

pip install that way is probably the best right now, I think my pypi release is a little behind.
-1 is used for background in the anchor code, you can pass typical targets with background = 0 to the model from your dataset
Since this code was built to be weight compatbible and able to replicate the original Tensorflow impl there are some aspects that will be a little different than most of the PyTorch obj detection models and closer to the TF RetinaNet style wrt to anchor gen/matching, coordinates, etc
The internal coordinate format of the model is YXYX, but in postproc and my dataset I transition from/to the coco XYWH
img_scale is used to move coorodinates between what I think of as the 'model canvas', the img_size * img_size input image size of the model. Umages are scaled down maintaining aspect to fit in that square, located at the origin, upper left corner, the rest is padded if the original image aspect is not a square. The img_scale stores ratio needed to move the output coordinates of the model back to the original image coordinate space for coco evaluator. you can just set img_scale to 1 to not use it. and the image size values used to crop bbox to (img_size, img_size) if you want to handle of the image sizing, scaling, evaluation yourself.
'empty annotations' would be cls and box targets all zero, they are fixed size tensors when passed to the model

lgvaz commented 4 years ago

Thanks for the quick answer Ross!! I'll close the issue since it's not really related to this repo

I believe this will help with Finetuning and questions like #49 as well, I hope people like it 😁

If I have additional questions, can I continue asking in this thread even with the issue closed?

Thanks again!

lgvaz commented 4 years ago

Hello Ross! I hope you are doing great!

So I'm almost done with the integration, I successfully trained a model which gave great results, but before I announce it, I want to make some clarification questions

This is how I'm creating a model:

    config = get_efficientdet_config(model_name=model_name)
    net = EfficientDet(config, pretrained_backbone=False)
    # load pretrained weights
    ...
    # configure head
    config.num_classes = num_classes
    config.image_size = img_size
    net.class_net = HeadNet(config, num_outputs=num_classes, norm_kwargs=dict(eps=0.001, momentum=0.01))

I'm customizing config only to create the head, is that correct?

Any recommendation for norm_kwargs values? Maybe I should just leave it to the default ones

I want to pad my images for training to maintain the aspect ratio like so:

I understood that for validation/inference we need to pass a list img_size with the image sizes before padding, do I have to pass anything similar while training?

For prediction I'm doing the following:

model2 = unwrap_bench(model) # model is DetBenchTrain
model2 = DetBenchPredict(model2, model.config)

Which seems to be working correclty, results with tf_efficientdet_lite0 on the fridge dataset:

rwightman / efficientdet-pytorch

Integration with Mantisshrimp #50