I try to use GRUCell(step=2) replace fc7, but I only get 62.7% mAP, I try my best, but I can't get better mAP, Does anyone know how to train GRU?or Can someone can explain why fc7 is better than gru?
It depends on the usecase, a GRU is probably better at classifying time series than a fully connected layer. A GRU does introduce more trainable variables though, so it's harder to train.
I try to use GRUCell(step=2) replace fc7, but I only get 62.7% mAP, I try my best, but I can't get better mAP, Does anyone know how to train GRU?or Can someone can explain why fc7 is better than gru?