Closed sbugallo closed 6 years ago
The NaN might be due to having a very large loss value that causes an overflow. But what I'm more concerned about are the 0 values in your bounding box and mask losses. Losses shouldn't be zero, so I think this is not related to the choice of backbone, but rather you have a bug in your code somewhere.
@waleedka does that mean that nan in general is caused by wrong prepared data?
@waleedka I tried to use Inception ResNet V2 as backbone but it drops the error from merging different shapes of C4. I'm wondering if the inception resnet v2 could be imported without modifying architecture. If not. @BugaDM How do you deal with that?
@paulcx Which layers are you using as endpoints? You are probably picking the wrong layer. Check that your C1, C2, C3, C4 and C5 have 1/2, 1/4, 1/8, 1/16 and 1/32 size with respect to the input's
@BugaDM I adopted as same as the endpoints you provided above (https://gist.github.com/BugaDM/09b1b76d04a570102c966a31f7d37198). The C4 has 17 x 17 x 1088 shape?
@paulcx I'm using a 1024x2014 input, and my endpoints are: C1 = Tensor("block1_pool/MaxPool:0", shape=(?, 512, 512, 64), dtype=float32) C2 = Tensor("block2_pool/MaxPool:0", shape=(?, 256, 256, 128), dtype=float32) C3 = Tensor("block3_pool/MaxPool:0", shape=(?, 128, 128, 256), dtype=float32) C4 = Tensor("block4_pool/MaxPool:0", shape=(?, 64, 64, 512), dtype=float32) C5 = Tensor("block5_pool/MaxPool:0", shape=(?, 32, 32, 512), dtype=float32)
As you can see, they are /2, /4, /8, /16, /32 of the input size.
@BugaDM Are these endpoints for "Inception ResNet V2"? I saw the endpoints which does not match the tensor shape within https://gist.github.com/BugaDM/09b1b76d04a570102c966a31f7d37198.
@waleedka Hi waleedka, do you have any idea of pointing out the corrent endpoints with Inception ResNet V2 or alternative solutions?
@BugaDM you are wong. It must be C1=C2= /2, C3=/4 C4 =/8 and C5=/16. You can print the shape in resnet50 as example
Hi,
I've trying to replace the ResNet 101 used as backbone with other architectures (e.g. VGG16, Inception V3, ResNeXt 101 or Inception ResNet V2) in order to check whether the results improve or not.
The problem is that, whenever I substitute the ResNet with any other architecture, the training losses of the mask branch are NaN or zero:
loss: nan - rpn_class_loss: 0.6948 - rpn_bbox_loss: 0.3827 - mrcnn_class_loss: nan - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: 0.2744 - val_mrcnn_class_loss: nan - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00
These are the implementations I have been using:
* Inception ResNet V2 * https://gist.github.com/BugaDM/09b1b76d04a570102c966a31f7d37198 * Inception V3 * https://gist.github.com/BugaDM/aabd048bcb1d7ab26ece4c3499f826e0 * ResNeXt 101 * https://gist.github.com/BugaDM/f6e3174953f93346b6002d9adc7eb3e5 * VGG16 * https://gist.github.com/BugaDM/cb70226bed33c0de49b289a8fbd4b667
I do not get any errors during execution. Any suggestions?
I need the VGG16 implementations. would you mind to share it again? the link shown me "page not found" message. thanks in advance
Hi,
I've trying to replace the ResNet 101 used as backbone with other architectures (e.g. VGG16, Inception V3, ResNeXt 101 or Inception ResNet V2) in order to check whether the results improve or not.
The problem is that, whenever I substitute the ResNet with any other architecture, the training losses of the mask branch are NaN or zero:
loss: nan - rpn_class_loss: 0.6948 - rpn_bbox_loss: 0.3827 - mrcnn_class_loss: nan - mrcnn_bbox_loss: 0.0000e+00 - mrcnn_mask_loss: 0.0000e+00 - val_loss: nan - val_rpn_class_loss: 0.6931 - val_rpn_bbox_loss: 0.2744 - val_mrcnn_class_loss: nan - val_mrcnn_bbox_loss: 0.0000e+00 - val_mrcnn_mask_loss: 0.0000e+00
These are the implementations I have been using:
Inception ResNet V2
https://gist.github.com/BugaDM/09b1b76d04a570102c966a31f7d37198
Inception V3
https://gist.github.com/BugaDM/aabd048bcb1d7ab26ece4c3499f826e0
ResNeXt 101
https://gist.github.com/BugaDM/f6e3174953f93346b6002d9adc7eb3e5
VGG16
https://gist.github.com/BugaDM/cb70226bed33c0de49b289a8fbd4b667
I do not get any errors during execution. Any suggestions?