Closed VGrondin closed 2 years ago
Hi, @VGrondin
I suspect that the shape of the position embedding is not the same as the shape of the x.
Please check out you have passed the correct shape parameters to this function. https://github.com/yangsenius/TransPose/blob/dab9007b6f61c9c8dce04d61669a04922bbcd148/lib/models/transpose_r.py#L296
Yes you are right! I had MODEL.IMAGE_SIZE = [256, 256], but in my case it's features from the backbone, so [14, 14]. I'm curious to see how well it will perform if I use such a small size.
Thanks for the help
Hi, first off thank you for this great work!
I'm trying to implement the Transformer part of your work to a mask r-cnn model. Using a Swin backbone, the RPN gives me n bounding box proposal and for each proposal I have the bbox features shape=[n, 256, 14, 14] extracted by the backbone. Now for each bbox, I would like to get the keypoints with :
I'm having an error at the line self.global_encoder(x, pos=self.pos_embedding)
The size of tensor a (196) must match the size of tensor b (1024) at non-singleton dimension 0
The x input in self.global_encoder(x, pos=self.pos_embedding) is of shape [256, n, 196], which seems wrong? I tried with shape [n, 256, 196] but it doesnt work either. What am I missing?