Closed dragonbook closed 5 years ago
Hi dragonbook,
I experimented your case and it does not improve the performance much, while consuming computational cost much more. When I designed my model, I considered receptive field size and conventional U-Net structure stuffs.
I think you have to set sigma value used to generate gt heatmap larger than the original one. Otherwise, it would generate too small blob on the gt heatmap (because of enlarged output heatmap size), so the model would have a difficulty to learning to localize hand keypoints.
Hi, I want to change the network's output resolution up to 88x88x88(that is, double the current heatmap's size) to simply enlarge estimation precision. Since your current network is pretty good now, I chose to simply adjust it by inserting one more decoder(with a upsample layer) block after original encoder-decoder block to double it's output(which is 44x44x44). (I also tried to add a longer skip/residual connection at original scale(88x88x88)). But they seems not to work much well like your original one in my experiments(emm..., actually some of them do work).
I wonder how do you designed your current network, except the common practices, like residual block and U-net like skip-connection. E.g. you used U-net style in a encoder-decoder sub-block in middle of the architecture after one basic conv layer, one pool layer and some residual blocks. What's your considerations?
Besides, Did you consider feature map cell's receptive field(in order to catch larger 3d context) when you design network? Did you try some experiments/network designs on 88x88x88 output resolution? Could you talk some experience or give me some suggestions?
Thanks!