Closed liyi193328 closed 7 years ago
Are you sure I got nan loss with categorical_crossentropy
? What are your labels look like?
resulting some rows are all zeroes
You shouldn't have some rows are all zeros - LSTM takes as input a 3D tensor with shape (nb_samples, input_length, input_dim)
So, from the side view, your input data should look like this:
|0000xxxx|/
|00xxxxxx|/
|xxxxxxxx|/
|00000xxx|/
where /
denotes the dimension of word2vec vectors
@jgc128 Thanks to your reply and help. my label likes [[0,1,0], [1,0,0]....[0,0,1]], whose shape is (nb_samples, nb_classes), here nb_samples = 10475, nb_classes=3; But I'm not sure what's the meaning of your symbols. From my perspective, nb_samples is the number of all sequences. And every sequence is represented as a 2D array, every row is a word vector, the number of column is word2vec's dim(here is 600). And if I don't pad some all zeros rows, how to make sure every 2D array have same input_length, considering every sequence has different lengths? Thanks!
@liyi193328 sorry, I was unclear.
Yes, nb_samples
is the number of all sequences (10475), input_length
is the length of the longest sequence and input_dim
is the dimension of word2vec vectors (600).
So the input matrix looks like this:
where View 1
is represented in my post above. 0's mean the zeros for padding (note we pad from the left), and x's are some vectors.
In the code it looks like this:
X = np.zeros((nb_samples, input_length, input_dim))
@jgc128 Thanks. Wonderful details about input 3D array. More specifically, three sentences like: [ [ He, like, keras], [learning], [like, keras] ] the word vector(4 dim) each is: He -> [1,1,1,1], like->[2,2,2,2], keras->[3,3,3,3], learning->[5,5,5,5] then after padding ,the 3D array shape is (2,3,4), like: [ [ [1,1,1,1], [2,2,2,2],[3,3,3,3] ], [ [0,0,0,0], [0,0,0,0], [5,5,5,5] ], [ [0,0,0,0], [2,2,2,2], [3,3,3,3] ] ] if the specific example right? Thanks.
@liyi193328 Yes, it's right. It should work.
@jgc128 Thanks. After padding like the example, I still get loss NaN. It's a little confusing. model.add(LSTM(output_dim=300,input_length=200,input_dim=600)) model.add(Dense(nb_classes)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', class_mode="categorical") Are there other tricks?
Has your target array been formatted correctly?
Eg. One class per column ?
Can you show your code for creating the input data?
@dandxy89 @jgc128 Thanks. Yes, my targel array.shape is (nsamples,nb_classes). One class per column (0,1,3). And I try your code in #853 (padd word index array with left zeros, add embedding layer ). Everything goes well and I get 66.1% acc for 3 classes with 8583 sequence.But embedding vectors cost much memory, resulting to GPU device memory can't allocate(only choose cpu). So I want to it faster with fix word vectors.
My code for input data is:
maxlen=200
def init(textDatapath='./allData.txt', word2vecPath='./word2vec',maxlen=200,nb_classes=3,updated=False,vecDataPath='./trainVec(part).pickle',newvecDataPath='./trainVec(new).pickle'):
Train = list()
seqTokens = list()
if os.path.isfile(vecDataPath) and updated == False:
ft = codecs.open(vecDataPath,"rb")
print("find ", vecDataPath)
D = pickle.load(ft)
Train = D['train']
Label = D['label']
ft.close()
else:
print("begin to update", newvecDataPath)
f = codecs.open(textDatapath, "r", "utf-8") #every line is a sequence
lines = f.readlines()
Label = []
Textokens = []
for line in lines:
t = line.split("\t") #t[0] is the target
tokens = jieba.lcut(t[1]) #segment sequence, get a list of tokens
vec = []
existToken = []
minum = 5 #sequences having least 5 tokens can be considered
for token in tokens:
try:
vector = word2vec[token] #get vector of token by word2vec(gensim)
vec.append(vector)
existToken.append(token)
except KeyError:
continue
if len(vec) <= 5:
continue
else:
s = np.array(vec) #s is the sequence's 2D array
Train.append(s)
Label.append(int(t[0]))
seqTokens.append(existToken)
if updated == True:
ft = codecs.open(newvecDataPath,"wb")
print("dump to ",newvecDataPath)
pickle.dump({'train':Train,'label':Label,'seqTokens':seqTokens},ft)
ft.close()
# Label and Train is a list, in Label every element is a scalar.
#In my case, Label is -1,0,1, so it needs to plus 1 to become 0,1,2
Label = np.array(Label,dtype='float32') + 1
Train = np.array(Train) # in train every element is a numpy array.
print("init finished!")
return [Train,Label]
def padTrainData(Train,Label):
print("pre train data...")
Label = np.array(Label,dtype='float32')
Label = np_utils.to_categorical(Label,nb_classes= 3)
nsamples = Train.shape[0]
train = np.empty((nsamples,maxlen,lstm_input_dim))
for i in range(nsamples):
t = Train[i]
(tokens,dim)=t.shape
if tokens < maxlen:
#s is the empty array
s = np.empty((maxlen-tokens, 600))
#combine s and t
train[i] = np.concatenate( (s,t),axis=0)
else:
train[i] = np.array(t[0:maxlen])
Train = np.array(train,dtype='float64')
return [Train,Label]
train = None
label = None
train,label = init(vecDataPath='./trainVec(new).pickle',updated=True)
train,label = padTrainData(train,label)
print("train shape:",train.shape)
print("label shape:",label.shape)
The results is:
begin to update ./trainVec(new).pickle
dump to ./trainVec(new).pickle
init finished!
pre train data...
train shape: (8583, 200, 600)
label shape: (8583, 3)
In my code, first step is get train data , the second is getting target label array, padding train data with zeros Thanks to help me with great patience.
One problem is you are using np.empty
but it does not initialize the array with zeros (see documentation). Try to use np.zeros
instead.
It should not give the nan loss though.. Have you tried to see what is the output of the network?
@jgc128 Thank you very much. Everything goes fine when I change np.empty to np.zeros ! It's all my mistake.sorry when use np.empty to init a array, the value may be two large or too small, resulting NaN? Another question is how to check the output of the network? use model.predict_proba , theano function or other ways? Thanks with sincerely!
Excellent!
You can use something like this classes = model.predict_classes(X_test, batch_size=32)
. See Getting started: 30 seconds to Keras for details
@jgc128 Thanks.I'll dive into it.
why don`t use keras.preprocessing.sequence.pad_sequences ? data= list() for individual in len( --): express_matrix = individual.express_individual_times() # 每个样本返回二维矩阵,N * 256; data.append(express_matrix) train_matrix = sequence.pad_sequences(data,padding='post', maxlen=40)
I checked the data, and make sure it pads OK, but still get "loss = NAN" for several sample. I wonder if I should delete these samples when they looks like very normal.
Hi I am using np.zeros() only but still getting very less accuracy around 37% on 900 samples for a 30 class classification. I used tanh as activation function before softmax layer. all suggestions are welcomed.
My code is as follows :
def build_matrix(word_index):
embedding_matrix = np.zeros((len(word_index) + 1, 100))
unknown_words = []
for word, i in word_index.items():
try:
embedding_matrix[i] = w2v_model[word]
except KeyError:
unknown_words.append(word)
return embedding_matrix
embedding_matrix=build_matrix(tokenizer.word_index)
model = Sequential() model.add(Embedding(max_features,embedding_matrix.shape[1], weights=[embedding_matrix],input_length=MAX_LEN,trainable=False)) model.add(SpatialDropout1D(0.3)) model.add(LSTM(LSTM_UNITS,activation='relu',return_sequences=True)) model.add(LSTM(LSTM_UNITS)) model.add(Dropout(0.5))
model.add(Dense(4LSTM_UNITS,input_shape=(1000,),activation='relu')) model.add(Dropout(0.5)) model.add(Dense(4LSTM_UNITS,activation='tanh')) model.add(Dense(30, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) print(model.summary())
history=model.fit(x_train, y_train, nb_epoch=11, batch_size=64,validation_data=(x_test,y_test))
scores = model.evaluate(x_test, y_test, verbose=0) print("Accuracy: %.2f%%" % (scores[1]*100))
I'm a theano and keras fresher, and want to learn them , which I think very interesting and helpful. The following question confuses me about for one week. But I can't work it out after try some ways mentioned before. I want to do sentiment analysis for texts to three classes. And I train word2vec(dim = 600) with gensim. My train data is 10475 sequences in different length. label shape is [10475,3] After setting maxlen of sequence 200, every sequence are converted to 200*600 2D array.If some sequence's length is less than 200, then the remaining values is filled with 0(padding), resulting some rows are all zeroes. And then I feed them into LSTM,
LSTM code as following:
model.fit(train,label,batch_size=100,nb_epoch=4,verbose=1,shuffle=True,validation_split=0.1,show_accuracy=True)
But Getting:
loss: nan
Train on 9430 samples, validate on 1048 samples Epoch 1/4 9430/9430 [==============================] - 99s - loss: nan - acc: 0.2992 - val_loss: nan - val_acc: 0.1355 Epoch 2/4 9430/9430 [==============================] - 96s - loss: nan - acc: 0.2992 - val_loss: nan - val_acc: 0.1355 Epoch 3/4 9430/9430 [==============================] - 96s - loss: nan - acc: 0.2992 - val_loss: nan - val_acc: 0.1355 Epoch 4/4 1600/9430 [====>.........................] - ETA: 75s - loss: nan - acc: 0.3038
I test different optimizer,also improve epsilon value, set clipnorm(in optimizer above) and different loss functions('mean_squared_error', 'categorical_crossentropy') and so on, but failed.
Also in cpu or gpu mode, loss value is also nan.
Even I switch to Convolution2D:
The loss values remain nan
Ways to solve?
So I'm wondering what's the real reason for the NaN loss value? How to solve or debug it? Is the word2vec data wrong , padding method wrong or other? If keras can't solve, I have to choose another deep learning package, or the reason is theano? what can I do then? please help.