Closed gsid888 closed 3 years ago
Could you provide some more information? What's your anchor_size and aspect_ratios?
Closing due to inactivity.
Hi @johschmidt42,
First of all: great tutorial! It really helped me to overcome the struggles of learning a new framework such as pytorch and it pointed me to some nice extras such as lightning or loggers (mlflow). So thank you very much for the detailed tutorial and the effort you have put in.
I have a problem with the FPN backbone, too. I am not quite sure what parts I have to alter. So far, I've changed the params such as FPN to True, resulting in an error telling me that the anchorsize must be Tuple(Tuple[int])). Changing the anchor size from
'ANCHOR_SIZE': ((32, 64, 128, 256, 512),),
to ((32,), (64,), (128,), (256,), (512,),),
as seen in other tutorials for RCNN with FPN did not help.
Can you please guide me into the right direction (or if not too much effort, give a short example of which params I have to alter and in which form)?
Grüße aus Deutschland :) Martin.
Hi, thank you! Simply remove one of the anchor_size values:
From ((32,), (64,), (128,), (256,), (512,),)
to ((32,), (64,), (128,), (256,),)
Reason: This is actually well explained with this error message Anchors should be Tuple[Tuple[int]] because each feature map could potentially have different sizes and aspect ratios. There needs to be a match between the number of feature maps passed and the number of sizes / aspect ratios specified.
Better explanation: The get_resnet_backbone
function I wrote returns a resnet backbone pretrained on ImageNet. But it also removes the average-pooling layer and the linear layer at the end. This is a bit different from other tutorials, where they don't remove the average-pooling layer (if I remember correctly). But with removing this layer, you'll have 4 feature map layers that you can use to create reference boxes with the anchor generator (instead of 5). If I also remember correctly, the last layer was dropped anyway, so there was no reason to specify 5 different anchor sizes ((32,), (64,), (128,), (256,), (512,),)
because 512
wouldn't be used anyway. I discovered this when I looked at the implemenation of the anchor generator, which I recommend doing!
Thank you for the very fast response. It works now, and additionally much thanks for the detailed explanation. I also have a deeper look into the generator and the backbone_resnet files :) Background: actually, I need to train a detector for handwriting/signatures in documents where image-based backends might not be perfect, though the first results are not bad at all.
Edit: I just saw, that the text detector engines in MMDetection/OCR use pre-trained image backbones, too, though they focus on text detection in real-world images (signs etc). Best, Martin
That's a very interesting topic, I'd like to see the results you'd get! Any chance to follow on your progress?
Unfortunately, our repo is hosted on a private GitLab, but I'll keep you updated and share some results as soon as I got interesting things. Thanks to mlflow (and again thanks to you for highlighting loggers in your tutorial), I can keep track easily of the setups :) Unfortunately, the training takes a long time such that a grid search (or random search) for the best setup will take a while. Additionally, I'll try to run a setup with MMDetection in the future such that I can try a wider variety of implementations.
Until then, stay safe! Martin
assert len(grid_sizes) == len(strides) == len(cell_anchors) error comes whe I use to have FPN backbone with the model