razvancaramalau / Visual-Transformer-for-Task-aware-Active-Learning

25 stars 9 forks source link

Object Detection #3

Open ghost opened 3 years ago

ghost commented 3 years ago

Hello, Thank you for sharing works!

I try to reproduce the experiment on VOC with ssd.pytorch. In the paper, The visual transformer bottleneck from our joint-learning selection is positioned only on the confidence head of the SSD network.

Can you explain this in detail? There are confidence heads from 6 feature maps.

razvancaramalau commented 3 years ago

Hi there, The input to the transformer bottleneck comes only from the confidences for all categories (C) as 8732 x (C B) where B is the batch size. So, the 8732 are the concatenated feature maps from all 6 heads. Thanks.

ghost commented 3 years ago

How to pass output of transformer to discriminator (sampler)? After transformer, the shape of input tensor to the discriminator is [B, (8732 x C)]?

razvancaramalau commented 3 years ago

That's correct [B, (8732 x C)].

ghost commented 3 years ago

I try to add the visual transformer bottleneck to here

conf = torch.cat([o.view(o.size(0), -1) for o in conf], 1) # shape: [B, (8732 x C)]
conf = rearrange(conf, 'b (conf c) -> conf (b c)')
conf = transformer(conf)
conf = rearrange(conf, 'conf (b c) -> b conf c', b=B, c=C).contiguous()

if self.phase == "test":
       output = self.detect(
                loc.view(loc.size(0), -1, 4),                   # loc preds
                self.softmax(conf),                # conf preds
                self.priors.type(type(x.data))                  # default boxes
            )
        else:
            output = (
                loc.view(loc.size(0), -1, 4),
                conf,
                self.priors
            )
        return output

But the conf became nan after few steps. Did I do anything wrong? I used discriminator(z_dim=183372) and transformer(dim=672, depth=1, heads=1, mlp_dim=128, dropout=0.1)

razvancaramalau commented 3 years ago

Yes, that looks correct. I've experienced the same in the beginning. Adjusting the learning rate and having a few runs stabilizes the gradients after the first 10 iterations. I was actually thinking of even shifting to another optimizer.

ghost commented 3 years ago

So far, I've used hyperparamters in the learning loss paper.

According to your advice, I lowered learning rate from 1e-3 to 1e-4 and then conf didn't become nan. But mAP is 6.77 at the first cycle (# annotated samples: 1,000) How to adjust the learning rate?