robmsmt / KerasDeepSpeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation
GNU Affero General Public License v3.0
242 stars 79 forks source link

Why for loop adding only one Conv1D layer in ds2_gru_model #17

Open menon92 opened 4 years ago

menon92 commented 4 years ago

Hello @robmsmt,

I'm working with your repo. In your model.py file bellow code should three layer but this add just only one Conv1D Layer

conv = ZeroPadding1D(padding=(0, 2048))(x)
for l in range(conv_layers):
  x = Conv1D(filters=fc_size, name='conv_{}'.format(l+1), kernel_size=11, padding='valid', activation='relu', strides=2)(conv)

This is the model summary I get,

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
the_input (InputLayer)          (None, None, 161)    0                                            
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, 161)    644         the_input[0][0]                  
__________________________________________________________________________________________________
zero_padding1d_1 (ZeroPadding1D (None, None, 161)    0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv_3 (Conv1D)                 (None, None, 512)    907264      zero_padding1d_1[0][0]           
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, 512)    2048        conv_3[0][0]                     
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 1024)   9443328     batch_normalization_2[0][0]      
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, None, 1024)   12589056    bidirectional_1[0][0]            
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, None, 1024)   12589056    bidirectional_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, 1024)   4096        bidirectional_3[0][0]            
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 512)    524800      batch_normalization_3[0][0]      
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, None, 1102)   565326      time_distributed_1[0][0]         
__________________________________________________________________________________________________
the_labels (InputLayer)         (None, None)         0                                            
__________________________________________________________________________________________________
input_length (InputLayer)       (None, 1)            0                                            
__________________________________________________________________________________________________
label_length (InputLayer)       (None, 1)            0                                            
__________________________________________________________________________________________________
ctc (Lambda)                    (None, 1)            0           time_distributed_2[0][0]         
                                                                 the_labels[0][0]                 
                                                                 input_length[0][0]               
                                                                 label_length[0][0]               
==================================================================================================
Total params: 36,625,618
Trainable params: 36,622,224
Non-trainable params: 3,394
__________________________________________________________________________________________________

What could be the possible reason. Thanks in advance