Open pritamqu opened 2 years ago
Hi, no worries, here is the explanation https://github.com/shchur/ifl-tpp#mistakes-in-the-old-version
You can find the original code used for experiments in the paper here https://github.com/shchur/ifl-tpp/tree/original-code
Hi, no worries, here is the explanation https://github.com/shchur/ifl-tpp#mistakes-in-the-old-version
Hi, can you kindly explain a bit more why we can not divide NLL by the real number of events in the sequence? In a paper you said this number should not depend on the sequence.
"In the old code we used to normalize the NLL of each sequence by the number of events --- this was incorrect. When computing NLL for multiple TPP sequences, we are only allowed to divide the NLL by the same number for each sequence."
Thanks in advance.
As a simple example, consider a homogenous Poisson process with rate $\lambda$ on an interval $[0, T]$. Suppose we have observed two sequences generates by this TPP - first containing $N_1$ events and the second containing $N_2$ events - and want to estimate the parameter $\lambda$ using MLE.
Without normalization, the likelihood is $N_1\log \lambda - \lambda T +N_2\log \lambda - \lambda T = (N_1 + N_2) \log \lambda - 2 \lambda T$.
If we normalize by the LL by $T$ we get $N_1 / T \log \lambda - \lambda +N_2/T \log \lambda - \lambda = (N_1 + N_2) /T \log \lambda - 2 \lambda$. This is proportional to the unnormalized log-likelihood, so we get the same MLE of $\lambda$.
If, however, we normalize the NLL for each sequence by the # of events, we get a different LL function, and end up with the wrong MLE estimate, as you can verify yourself $\log \lambda - T/N_1 \lambda + \log \lambda - T/N_2 \lambda = 2 \log \lambda - (T/N_1 + T/N_2) \lambda$
This small example demonstrates that normalizing by # of events leads to incorrect estimation of the TPP parameters.
As a simple example, consider a homogenous Poisson process with rate λ on an interval [0,T]. Suppose we have observed two sequences generates by this TPP - first containing N1 events and the second containing N2 events - and want to estimate the parameter λ using MLE.
Without normalization, the likelihood is N1logλ−λT+N2logλ−λT=(N1+N2)logλ−2λT.
If we normalize by the LL by T we get N1/Tlogλ−λ+N2/Tlogλ−λ=(N1+N2)/Tlogλ−2λ. This is proportional to the unnormalized log-likelihood, so we get the same MLE of λ.
If, however, we normalize the NLL for each sequence by the # of events, we get a different LL function, and end up with the wrong MLE estimate, as you can verify yourself logλ−T/N1λ+logλ−T/N2λ=2logλ−(T/N1+T/N2)λ
This small example demonstrates that normalizing by # of events leads to incorrect estimation of the TPP parameters.
Thanks for your response. I get it.
Hi - I was trying your code on Hawkes-1 similar to the https://github.com/shchur/ifl-tpp/blob/master/code/interactive.ipynb file. The NLL on test set is 43.7, but the reported result in the paper 0.52. Could you please clarify if there should be an additional step be done to get the final result.
My apology if this is a naive question, I am very new in the area of TPP!