zouchuhang / LayoutNet

Torch implementation of our CVPR 18 paper: "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image"
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zou_LayoutNet_Reconstructing_the_CVPR_2018_paper.pdf
MIT License
417 stars 93 forks source link

Issues reproducing network - resulting with different size #22

Open GalAvineri opened 5 years ago

GalAvineri commented 5 years ago

Hello! :) I'm trying to implement your network with keras and it that the network I built has many more parameters than the amount you declared at your paper. You've mentioned you have been able to train the entire network with a batch size of 20 using 12GB. (I've even seen in #5 that you've mentioned you use 10.969GB) It seems that my gpu has 10.57GiB available, but when I try to use a batch size of 15, which by calculation should fit the gpu, the gpu cannot fit the model into it's memory. I've even removed the 3-D regresson part and it still fails.

So I wanted to ask if you could help me see if i've made any implementation error :) Could you for example provide the total number of parameters of your model? And perhaps even better, provide the number of parameters per layer? :)

Here is the description of my implementation :) I've defined the network as follows:

def layoutnet():
    # Encoder
    input = layers.Input(shape=(6, 512, 1024))  # chw format
    e1 = conv2d_relu_pool(input, 32, name='e1')  # [?, 32, 256, 512]
    e2 = conv2d_relu_pool(e1, 64, name='e2')  # [?, 64, 128, 256]
    e3 = conv2d_relu_pool(e2, 128, name='e3')  # [?, 128, 64, 128]
    e4 = conv2d_relu_pool(e3, 256, name='e4')  # [?, 256, 32, 64]
    e5 = conv2d_relu_pool(e4, 512, name='e5')  # [?, 512, 16, 32]
    e6 = conv2d_relu_pool(e5, 1024, name='e6')  # [?, 1024, 8, 16]
    e7 = conv2d_relu_pool(e6, 2048, name='e7')  # [?, 2048, 4, 8]
    encoder = Model(input, e7)

    # Top decoder branch
    td1 = up_conv2d_relu(e7, 1024, 'td1')  # [?, 8, 16, 1024]
    td1 = layers.Concatenate(axis=1, name='td1_concat')([td1, e6])  # [?, 1024 * 2, 8, 16]

    td2 = up_conv2d_relu(td1, 512, name='td2')  # [?, 16, 32, 512]
    td2 = layers.Concatenate(axis=1, name='td2_concat')([td2, e5])  # [?, 512 * 2, 16, 32]

    td3 = up_conv2d_relu(td2, 256, name='td3')  # [?, 32, 64, 256]
    td3 = layers.Concatenate(axis=1, name='td3_concat')([td3, e4])  # [?, 256 * 2, 32, 64]

    td4 = up_conv2d_relu(td3, 128, name='td4')  # [?, 64, 128, 128]
    td4 = layers.Concatenate(axis=1, name='td4_concat')([td4, e3])  # [?, 128 * 2, 64, 128]

    td5 = up_conv2d_relu(td4, 64, name='td5')  # [?, 128, 256, 64]
    td5 = layers.Concatenate(axis=1, name='td5_concat')([td5, e2])  # [?, 64 * 2, 128, 256]

    td6 = up_conv2d_relu(td5, 32, name='td6')  # [?, 256, 512, 32]
    td6 = layers.Concatenate(axis=1, name='td6_concat')([td6, e1])  # [?, 32 * 2, 256, 512]

    td7 = up_conv2d_relu(td6, 3, name='td7')  # [?, 512, 1024, 3]
    td = layers.Activation('sigmoid')(td7)
    top_decoder = Model(input, td)

    # Bottom decoder branch
    bd1 = layers.Convolution2D(1024, (3, 3), (1, 1), padding='same', activation='relu', name='bd1_conv'+'_conv')(top_decoder.get_layer('td1_upsample').output)  # [?, 1024, 8, 16]
    bd1 = layers.Concatenate(axis=1, name='bd1_concat')([bd1, td1])  # [?, 1024 * 3, 8, 16]

    bd2 = up_conv2d_relu(bd1, 512, name='bd2')  # [?, 16, 32, 512]
    bd2 = layers.Concatenate(axis=1, name='bd2_concat')([bd2, td2])  # [?, 512 * 3, 16, 32]

    bd3 = up_conv2d_relu(bd2, 256, name='bd3')  # [?, 32, 64, 256]
    bd3 = layers.Concatenate(axis=1, name='bd3_concat')([bd3, td3])  # [?, 256 * 3, 32, 64]

    bd4 = up_conv2d_relu(bd3, 128, name='bd4')  # [?, 64, 128, 128]
    bd4 = layers.Concatenate(axis=1, name='bd4_concat')([bd4, td4])  # [?, 128 * 3, 64, 128]

    bd5 = up_conv2d_relu(bd4, 64, name='bd5')  # [?, 128, 256, 64]
    bd5 = layers.Concatenate(axis=1, name='bd5_concat')([bd5, td5])  # [?, 64 * 3, 128, 256]

    bd6 = up_conv2d_relu(bd5, 32, name='bd6')  # [?, 256, 512, 32]
    bd6 = layers.Concatenate(axis=1, name='bd6_concat')([bd6, td6])  # [?, 32 * 3, 256, 512]

    bd7 = up_conv2d_relu(bd6, 1, name='bd7')  # [?, 512, 1024, 1]
    bd = layers.Activation('sigmoid')(bd7)
    bot_decoder = Model(input, bd)

    # 3D box
    # reg = layers.Concatenate(axis=1, name='reg_input')([td, bd])  # [?, 4, 512, 1024]
    # reg = conv2d_relu_pool(reg, 8, name='reg_downsample1')  # [?, 8, 256, 512]
    # reg = conv2d_relu_pool(reg, 16, name='reg_downsample2')  # [?, 16, 128, 256]
    # reg = conv2d_relu_pool(reg, 32, name='reg_downsample3')  # [?, 32, 64, 128]
    # reg = conv2d_relu_pool(reg, 64, name='reg_downsample4')  # [?, 64, 32, 64]
    # reg = conv2d_relu_pool(reg, 128, name='reg_downsample5')  # [?, 128, 16, 32]
    # reg = conv2d_relu_pool(reg, 256, name='reg_downsample6')  # [?, 256, 8, 16]
    # reg = conv2d_relu_pool(reg, 512, name='reg_downsample7')  # [?, 512, 4, 8]
    # reg = layers.Flatten(name='reg_flatten')(reg)
    # reg = layers.Dense(1024, activation='relu', name='reg_dense1')(reg)
    # reg = layers.Dense(256, activation='relu', name='reg_dense2')(reg)
    # reg = layers.Dense(64, activation='relu', name='reg_dense3')(reg)
    # reg = layers.Dense(6, name='reg_dense4')(reg)

    # model = Model(input, [top_decoder, bot_decoder, reg])
    model = Model(input, [td, bd])
    return model

And the number of parameters per layer is shown here:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 6, 512, 1024) 0                                            
__________________________________________________________________________________________________
e1_conv (Conv2D)                (None, 32, 512, 1024 1760        input_1[0][0]                    
__________________________________________________________________________________________________
e1_pool (MaxPooling2D)          (None, 32, 256, 512) 0           e1_conv[0][0]                    
__________________________________________________________________________________________________
e2_conv (Conv2D)                (None, 64, 256, 512) 18496       e1_pool[0][0]                    
__________________________________________________________________________________________________
e2_pool (MaxPooling2D)          (None, 64, 128, 256) 0           e2_conv[0][0]                    
__________________________________________________________________________________________________
e3_conv (Conv2D)                (None, 128, 128, 256 73856       e2_pool[0][0]                    
__________________________________________________________________________________________________
e3_pool (MaxPooling2D)          (None, 128, 64, 128) 0           e3_conv[0][0]                    
__________________________________________________________________________________________________
e4_conv (Conv2D)                (None, 256, 64, 128) 295168      e3_pool[0][0]                    
__________________________________________________________________________________________________
e4_pool (MaxPooling2D)          (None, 256, 32, 64)  0           e4_conv[0][0]                    
__________________________________________________________________________________________________
e5_conv (Conv2D)                (None, 512, 32, 64)  1180160     e4_pool[0][0]                    
__________________________________________________________________________________________________
e5_pool (MaxPooling2D)          (None, 512, 16, 32)  0           e5_conv[0][0]                    
__________________________________________________________________________________________________
e6_conv (Conv2D)                (None, 1024, 16, 32) 4719616     e5_pool[0][0]                    
__________________________________________________________________________________________________
e6_pool (MaxPooling2D)          (None, 1024, 8, 16)  0           e6_conv[0][0]                    
__________________________________________________________________________________________________
e7_conv (Conv2D)                (None, 2048, 8, 16)  18876416    e6_pool[0][0]                    
__________________________________________________________________________________________________
e7_pool (MaxPooling2D)          (None, 2048, 4, 8)   0           e7_conv[0][0]                    
__________________________________________________________________________________________________
td1_upsample (UpSampling2D)     (None, 2048, 8, 16)  0           e7_pool[0][0]                    
__________________________________________________________________________________________________
td1_conv (Conv2D)               (None, 1024, 8, 16)  18875392    td1_upsample[0][0]               
__________________________________________________________________________________________________
td1_concat (Concatenate)        (None, 2048, 8, 16)  0           td1_conv[0][0]                   
                                                                 e6_pool[0][0]                    
__________________________________________________________________________________________________
bd1_conv_conv (Conv2D)          (None, 1024, 8, 16)  18875392    td1_upsample[0][0]               
__________________________________________________________________________________________________
td2_upsample (UpSampling2D)     (None, 2048, 16, 32) 0           td1_concat[0][0]                 
__________________________________________________________________________________________________
bd1_concat (Concatenate)        (None, 3072, 8, 16)  0           bd1_conv_conv[0][0]              
                                                                 td1_concat[0][0]                 
__________________________________________________________________________________________________
td2_conv (Conv2D)               (None, 512, 16, 32)  9437696     td2_upsample[0][0]               
__________________________________________________________________________________________________
bd2_upsample (UpSampling2D)     (None, 3072, 16, 32) 0           bd1_concat[0][0]                 
__________________________________________________________________________________________________
td2_concat (Concatenate)        (None, 1024, 16, 32) 0           td2_conv[0][0]                   
                                                                 e5_pool[0][0]                    
__________________________________________________________________________________________________
bd2_conv (Conv2D)               (None, 512, 16, 32)  14156288    bd2_upsample[0][0]               
__________________________________________________________________________________________________
td3_upsample (UpSampling2D)     (None, 1024, 32, 64) 0           td2_concat[0][0]                 
__________________________________________________________________________________________________
bd2_concat (Concatenate)        (None, 1536, 16, 32) 0           bd2_conv[0][0]                   
                                                                 td2_concat[0][0]                 
__________________________________________________________________________________________________
td3_conv (Conv2D)               (None, 256, 32, 64)  2359552     td3_upsample[0][0]               
__________________________________________________________________________________________________
bd3_upsample (UpSampling2D)     (None, 1536, 32, 64) 0           bd2_concat[0][0]                 
__________________________________________________________________________________________________
td3_concat (Concatenate)        (None, 512, 32, 64)  0           td3_conv[0][0]                   
                                                                 e4_pool[0][0]                    
__________________________________________________________________________________________________
bd3_conv (Conv2D)               (None, 256, 32, 64)  3539200     bd3_upsample[0][0]               
__________________________________________________________________________________________________
td4_upsample (UpSampling2D)     (None, 512, 64, 128) 0           td3_concat[0][0]                 
__________________________________________________________________________________________________
bd3_concat (Concatenate)        (None, 768, 32, 64)  0           bd3_conv[0][0]                   
                                                                 td3_concat[0][0]                 
__________________________________________________________________________________________________
td4_conv (Conv2D)               (None, 128, 64, 128) 589952      td4_upsample[0][0]               
__________________________________________________________________________________________________
bd4_upsample (UpSampling2D)     (None, 768, 64, 128) 0           bd3_concat[0][0]                 
__________________________________________________________________________________________________
td4_concat (Concatenate)        (None, 256, 64, 128) 0           td4_conv[0][0]                   
                                                                 e3_pool[0][0]                    
__________________________________________________________________________________________________
bd4_conv (Conv2D)               (None, 128, 64, 128) 884864      bd4_upsample[0][0]               
__________________________________________________________________________________________________
td5_upsample (UpSampling2D)     (None, 256, 128, 256 0           td4_concat[0][0]                 
__________________________________________________________________________________________________
bd4_concat (Concatenate)        (None, 384, 64, 128) 0           bd4_conv[0][0]                   
                                                                 td4_concat[0][0]                 
__________________________________________________________________________________________________
td5_conv (Conv2D)               (None, 64, 128, 256) 147520      td5_upsample[0][0]               
__________________________________________________________________________________________________
bd5_upsample (UpSampling2D)     (None, 384, 128, 256 0           bd4_concat[0][0]                 
__________________________________________________________________________________________________
td5_concat (Concatenate)        (None, 128, 128, 256 0           td5_conv[0][0]                   
                                                                 e2_pool[0][0]                    
__________________________________________________________________________________________________
bd5_conv (Conv2D)               (None, 64, 128, 256) 221248      bd5_upsample[0][0]               
__________________________________________________________________________________________________
td6_upsample (UpSampling2D)     (None, 128, 256, 512 0           td5_concat[0][0]                 
__________________________________________________________________________________________________
bd5_concat (Concatenate)        (None, 192, 128, 256 0           bd5_conv[0][0]                   
                                                                 td5_concat[0][0]                 
__________________________________________________________________________________________________
td6_conv (Conv2D)               (None, 32, 256, 512) 36896       td6_upsample[0][0]               
__________________________________________________________________________________________________
bd6_upsample (UpSampling2D)     (None, 192, 256, 512 0           bd5_concat[0][0]                 
__________________________________________________________________________________________________
td6_concat (Concatenate)        (None, 64, 256, 512) 0           td6_conv[0][0]                   
                                                                 e1_pool[0][0]                    
__________________________________________________________________________________________________
bd6_conv (Conv2D)               (None, 32, 256, 512) 55328       bd6_upsample[0][0]               
__________________________________________________________________________________________________
bd6_concat (Concatenate)        (None, 96, 256, 512) 0           bd6_conv[0][0]                   
                                                                 td6_concat[0][0]                 
__________________________________________________________________________________________________
td7_upsample (UpSampling2D)     (None, 64, 512, 1024 0           td6_concat[0][0]                 
__________________________________________________________________________________________________
bd7_upsample (UpSampling2D)     (None, 96, 512, 1024 0           bd6_concat[0][0]                 
__________________________________________________________________________________________________
td7_conv (Conv2D)               (None, 3, 512, 1024) 1731        td7_upsample[0][0]               
__________________________________________________________________________________________________
bd7_conv (Conv2D)               (None, 1, 512, 1024) 865         bd7_upsample[0][0]               
__________________________________________________________________________________________________
activation (Activation)         (None, 3, 512, 1024) 0           td7_conv[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 1, 512, 1024) 0           bd7_conv[0][0]                   
==================================================================================================
Total params: 94,347,396
Trainable params: 94,347,396
Non-trainable params: 0
zouchuhang commented 5 years ago

@GalAvineri I check the total number of parameters of my model, this has the same number of parameters as yours. The memory cost differs from different deepnet tool. You can try to reduce the batch size further (e.g. just try 1), or to reduce the input image size as in https://github.com/zouchuhang/LayoutNet/issues/5#issuecomment-393949323