Closed amineebenamor closed 5 years ago
Hello @amineebenamor, Actually, The output of the model is a probability distribution function P(m_i=k_i│w),k_i∈K(i=1,⋯,n) from final hidden state and the label is a one-hot vector As below.
criterion = nn.CrossEntropyLoss()
loss = criterion(output, label.to(device))
But, when you calculate the CrossEntropyLoss as below, finally, we get loss = So, only the number where is 1 in the one-hot vector makes function because other is 0 where calculate the CEloss. You can also refer the API docs.
Thank you for your answer!
Hello @wuyifan18 & @amineebenamor,
It looks @amineebenamor really wanted to figure out why the INPUT data is not one-hot format in the code. It is not related with the output. Using the integers (mapping to log keys) will indicate the model that big values (keys) are more important than the small values (keys). Obviously this is not what we want.
In my revised version of implementation, the input data in one-hot format gives me much better result than the data in integers on my training data set (logs from a special embedding system). E.g. The loss converges quickly and much smaller, and easily get zero false positive in training.
Hello @wuyifan18 , First, thank you for your implementation! I'm wondering why you're not using one-hot encoding to encode the log keys in input, as these are categorical variables. (in addition, deeplog paper says that they've done it) You are doing here an ordinal encoding by mapping each log key to an integer. Thank you for your answer!