murthyrudra / NeuralNER

Implementation of Multilingual Neural NER
GNU General Public License v3.0
5 stars 2 forks source link

NeuralNERmono printing embeddings #7

Closed samarohith closed 5 years ago

samarohith commented 5 years ago

When I execute NeuralNer.py for monolingual , it is printing the whole embedding file. I don't understand why it is printing as such. I found that it is happening because of the line 124 from load_embeddings method. Please look into this

murthyrudra commented 5 years ago

Hi, does it terminate after printing the word embeddings? If yes, it is due to line 39 to 42 in utilsLocal.py. It is due to dimension mismatch. Are the word embeddings in word2vec format i.e, first line contains number of words and dimension followed by word embeddings or is it in glove format i.e., the header line is absent and directly lists word embeddings?

samarohith commented 5 years ago

Actually the exit statement caused an error in the console(google colab), so I removed it. When I run the code ,it is printing the word embeddings. The word embeddings are in word2vec format. After certain time, after printing all the embeddings, this error pops up :

---> 61 embedd_dict, embedding_vocab, reverse_word_vocab, vocabularySize, embeddingDimension = load_embeddings(embedding_path) 62 print("Read Word Embedding of dimension " + str(embeddingDimension) + " for " + str(vocabularySize) + " number of words") 63

/content/utilsLocal.py in load_embeddings(file_name) 51 52 vec = np.zeros(dimension) ---> 53 wordEmbedding = np.vstack( [vec, wv_np, vec]) 54 55 return wordEmbedding, dictionary, reverseDict, wordEmbedding.shape[0], dimension

/usr/local/lib/python3.6/dist-packages/numpy/core/shape_base.py in vstack(tup) 281 """ 282 _warn_for_nonsequence(tup) --> 283 return _nx.concatenate([atleast_2d(_m) for _m in tup], 0) 284 285

ValueError: all the input array dimensions except for the concatenation axis must match exactly

murthyrudra commented 5 years ago

Hi, Could you convert it Glove format? You need to delete the first line which specifies the number of words and dimension information in the word embedding file. Meanwhile, I push changes which can handle both word2vec format and Glove format.

samarohith commented 5 years ago

Yeah I converted it into glove format now, but still the same error persists