Query regarding the structure of hourglass model

yuanyuanli85 / Stacked_Hourglass_Network_Keras

Keras Implementation for Stacked Hourglass Network

112 stars 44 forks source link

Query regarding the structure of hourglass model #36

Open n33lkanth opened 3 years ago

n33lkanth commented 3 years ago

Dear VictorLi, First of all, I would like to thank you for providing the Keras translation of the original model. I have a few doubts/question: 1) I do not see any implementation for the 1st right feature layer 2) Dimension of the 4th left feature is the same as the next three bottleneck layers (just before the 1st right feature layer). As per the hourglass diagram, shouldn't the dimension of this 3 layer be smaller than the 4th left feature layer?

I have modified your implementation a bit to fit my requirement. I have attached the graph.pdf legend.pdf hourglassModule.pdf

Kindly help me to understand these doubts. Thanks in advance

yuanyuanli85 commented 3 years ago

bottom_layer is the function to add the bottom layers into network. For the dimension you asking, I did not remember them clearly. Just make sure the dimensions of feature maps sent to Add() the same.

def bottom_layer(lf8, bottleneck, hgid, num_channels):
    # blocks in lowest resolution
    # 3 bottlenect blocks + Add

    lf8_connect = bottleneck(lf8, num_channels, str(hgid) + "_lf8")

    _x = bottleneck(lf8, num_channels, str(hgid) + "_lf8_x1")
    _x = bottleneck(_x, num_channels, str(hgid) + "_lf8_x2")
    _x = bottleneck(_x, num_channels, str(hgid) + "_lf8_x3")

    rf8 = Add()([_x, lf8_connect])

    return rf8

n33lkanth commented 3 years ago

Hi, Thank you for your reply. I have the following doubts and need your help to understand.

1) Inside the function create_right_half_blocks() why do not we use the function create_right_half_blocks() for rf8 like it is done for rf4, rf2, and rf1 ? 2) What is the 'num_channels' parameter in the hourglass module. I can see the values are set to 128 (for the tiny model) and 256 for the main model. I want to use this model for the 'Sound event detection' problem. I will be using mel-spectrogram as an input feature, So I am not sure what value should be set to 'num_channels' and why? Please help.

Note: Feature dimension is (512 x 128) ~ (T x F)