Open semihcanturk opened 4 years ago
It seems to me that a direct forward pass via
model(x)
and using the extractor's forward pass throughforward_pass_on_convolutions(x)
gives outputs of different sizes.
forward_pass_on_convolutions(x)
outputs a tensor of size (1, 477360), which is the flattened form of (1, 3, 36, 52, 85) -> (1, 5616, 85) -> (1, 477360).However, using
model_output = self.model(x)
gives multiple outputs:model_output[0]
has shape (1, 7371, 85), as opposed to (1, 5616, 85) we previously obtained. I turned tomodel_output[1]
, which is a list of size 3, to understand what's going on:model_output[1][0].shape -> (1, 3, 9, 13, 85) -> (1, 351, 85) model_output[1][1].shape -> (1, 3, 18, 26, 85) -> (1, 1404, 85) model_output[1][2].shape -> (1, 3, 36, 52, 85) -> (1, 5616, 85): this is what `forward_pass_on_convolutions(x)` returns.
Now, concatenating these along axis 1 gives us: (1, 351 + 1404 + 5616, 85) -> (1, 7371, 85): this is the shape of
model_output[0]
.The YOLOv2/YOLO9000 paper mentions the following:
Fine-Grained Features.This modified YOLO predicts detections on a 13 × 13 feature map. While this is sufficient for large objects, it may benefit from finer grained features for localizing smaller objects. Faster R-CNN and SSD both run their proposal networks at various feature maps in the network to get a range of resolutions. We take a different approach, simply adding a passthrough layer that brings features from an earlier layer at 26 × 26 resolution.
I infer from this that a similar feature is at work here, and results from 3 different resolutions are brought together as outputs, and concatenated to produce an output of size (1, 7371, 85). However,
forward_pass_on_convolutions(x)
only provides the outputs of the 3rd resolution, hence the equality withmodel_output[1][2].shape -> (1, 5616, 85)
.In light of these, I have two questions:
- Why does
forward_pass_on_convolutions(x)
not include the outputs of the other resolutions? It seems like in the current setting we are backpropagating with incomplete target outputs (the shape of the target outputs we generate ingenerate_cam
are also (1, 5616, 85)).- As a solution, I tried to generate 3 target tensors with sizes that correspond to the 3 resolutions, but only the one with size (1, 5616, 85) can be backpropagated, the others expectedly fail on
model_output.backward()
due to size incompatibility. How can I go around this so that the other sizes can be backpropagated as well?Many thanks for the help in advance. Hi, how do you fix this problem? x = x + layer_outputs[mdef["from"]] TypeError: list indices must be integers or slices, not list
Looking forward to your help.
It seems to me that a direct forward pass via
model(x)
and using the extractor's forward pass throughforward_pass_on_convolutions(x)
gives outputs of different sizes.forward_pass_on_convolutions(x)
outputs a tensor of size (1, 477360), which is the flattened form of (1, 3, 36, 52, 85) -> (1, 5616, 85) -> (1, 477360).However, using
model_output = self.model(x)
gives multiple outputs:model_output[0]
has shape (1, 7371, 85), as opposed to (1, 5616, 85) we previously obtained. I turned tomodel_output[1]
, which is a list of size 3, to understand what's going on:Now, concatenating these along axis 1 gives us: (1, 351 + 1404 + 5616, 85) -> (1, 7371, 85): this is the shape of
model_output[0]
.The YOLOv2/YOLO9000 paper mentions the following:
I infer from this that a similar feature is at work here, and results from 3 different resolutions are brought together as outputs, and concatenated to produce an output of size (1, 7371, 85). However,
forward_pass_on_convolutions(x)
only provides the outputs of the 3rd resolution, hence the equality withmodel_output[1][2].shape -> (1, 5616, 85)
.In light of these, I have two questions:
1) Why does
forward_pass_on_convolutions(x)
not include the outputs of the other resolutions? It seems like in the current setting we are backpropagating with incomplete target outputs (the shape of the target outputs we generate ingenerate_cam
are also (1, 5616, 85)).2) As a solution, I tried to generate 3 target tensors with sizes that correspond to the 3 resolutions, but only the one with size (1, 5616, 85) can be backpropagated, the others expectedly fail on
model_output.backward()
due to size incompatibility. How can I go around this so that the other sizes can be backpropagated as well?Many thanks for the help in advance.