Invalid argument: Matrix size-incompatible: In[0]: [0,1], In[1]: [70,70] [[{{node loss/crf_1_loss/MatMul_1}}]] [[metrics/crf_viterbi_accuracy/strided_slice_10/_229]]

imayachita commented 4 years ago

Hi all, I tried to use CRF layer on top of Bi-LSTM-CNN NER model on this implementation https://github.com/kamalkraj/Named-Entity-Recognition-with-Bidirectional-LSTM-CNNs

wordEmbeddings = wordEmbeddings.reshape((wordEmbeddings.shape[0],300))
words_input = Input(shape=(None,),dtype='int32',name='words_input')
words = Embedding(input_dim=wordEmbeddings.shape[0], output_dim=wordEmbeddings.shape[1],  weights=[wordEmbeddings], trainable=False)(words_input)
casing_input = Input(shape=(None,), dtype='int32', name='casing_input')
casing = Embedding(output_dim=caseEmbeddings.shape[1], input_dim=caseEmbeddings.shape[0], weights=[caseEmbeddings], trainable=False)(casing_input)
character_input=Input(shape=(None,52,),name='char_input')
embed_char_out=TimeDistributed(Embedding(len(char2Idx),30,embeddings_initializer=RandomUniform(minval=-0.5, maxval=0.5)), name='char_embedding')(character_input)
dropout= Dropout(0.5)(embed_char_out)
conv1d_out= TimeDistributed(Conv1D(kernel_size=3, filters=30, padding='same',activation='tanh', strides=1))(dropout)
maxpool_out=TimeDistributed(MaxPooling1D(52))(conv1d_out)
char = TimeDistributed(Flatten())(maxpool_out)
char = Dropout(0.5)(char)
output = concatenate([words, casing,char])
output = Bidirectional(LSTM(200, return_sequences=True, dropout=0.50, recurrent_dropout=0.25))(output)
output = TimeDistributed(Dense(100, activation='softmax'))(output)
print(len(label2Idx))
crf = CRF(len(label2Idx))
output = crf(output)

model = Model(inputs=[words_input, casing_input,character_input], outputs=[output])
model.compile(loss=crf.loss_function, optimizer='adam', metrics=[crf.accuracy])

for epoch in range(epochs):
    print("Epoch %d/%d"%(epoch,epochs))
    a = Progbar(len(train_batch_len))
    for i,batch in enumerate(iterate_minibatches(train_batch,train_batch_len)):
        labels, tokens, casing,char = batch
        print(labels.shape, tokens.shape, casing.shape, char.shape)
        model.train_on_batch([tokens, casing,char], labels)
        a.update(i)
    a.update(i+1)
    print(' ')

I printed labels.shape, tokens.shape, casing.shape, char.shape and it gave me this for the item: (146, 1, 1) (146, 1) (146, 1) (146, 1, 52)

Below is the model summary:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
char_input (InputLayer)         (None, None, 52)     0                                            
__________________________________________________________________________________________________
char_embedding (TimeDistributed (None, None, 52, 30) 2910        char_input[0][0]                 
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, None, 52, 30) 0           char_embedding[0][0]             
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, 52, 30) 2730        dropout_1[0][0]                  
__________________________________________________________________________________________________
time_distributed_2 (TimeDistrib (None, None, 1, 30)  0           time_distributed_1[0][0]         
__________________________________________________________________________________________________
words_input (InputLayer)        (None, None)         0                                            
__________________________________________________________________________________________________
casing_input (InputLayer)       (None, None)         0                                            
__________________________________________________________________________________________________
time_distributed_3 (TimeDistrib (None, None, 30)     0           time_distributed_2[0][0]         
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 300)    938100      words_input[0][0]                
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, None, 8)      64          casing_input[0][0]               
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, None, 30)     0           time_distributed_3[0][0]         
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, None, 338)    0           embedding_1[0][0]                
                                                                 embedding_2[0][0]                
                                                                 dropout_2[0][0]                  
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, None, 400)    862400      concatenate_1[0][0]              
__________________________________________________________________________________________________
time_distributed_4 (TimeDistrib (None, None, 100)    40100       bidirectional_1[0][0]            
__________________________________________________________________________________________________
crf_1 (CRF)                     (None, None, 70)     12110       time_distributed_4[0][0]         
==================================================================================================
Total params: 1,858,414
Trainable params: 920,250
Non-trainable params: 938,164

I got following error

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Matrix size-incompatible: In[0]: [0,1], In[1]: [70,70]
     [[{{node loss/crf_1_loss/MatMul_1}}]]
     [[metrics/crf_viterbi_accuracy/strided_slice_10/_229]]
  (1) Invalid argument: Matrix size-incompatible: In[0]: [0,1], In[1]: [70,70]
     [[{{node loss/crf_1_loss/MatMul_1}}]]

Can anyone please help? Thanks!

xxlxx1 commented 4 years ago

you should use one hot target

xxlxx1 commented 4 years ago

when you use

 CRF(self.n_class, sparse_target=True)

can work too, sparese_target dosen't need one hot target

AstralWatcher commented 4 years ago

 CRF(self.n_class, sparse_target=True)

Thanks helped a lot, after few hours of trying to solve, you saved me <3.

keras-team / keras-contrib

Invalid argument: Matrix size-incompatible: In[0]: [0,1], In[1]: [70,70] [[{{node loss/crf_1_loss/MatMul_1}}]] [[metrics/crf_viterbi_accuracy/strided_slice_10/_229]] #528