shchur / ifl-tpp

Implementation of "Intensity-Free Learning of Temporal Point Processes" (Spotlight @ ICLR 2020)
https://openreview.net/forum?id=HygOjhEYDH
MIT License
80 stars 31 forks source link

NLL results #20

Open pritamqu opened 2 years ago

pritamqu commented 2 years ago

Hi - I was trying your code on Hawkes-1 similar to the https://github.com/shchur/ifl-tpp/blob/master/code/interactive.ipynb file. The NLL on test set is 43.7, but the reported result in the paper 0.52. Could you please clarify if there should be an additional step be done to get the final result.

My apology if this is a naive question, I am very new in the area of TPP! image

shchur commented 2 years ago

Hi, no worries, here is the explanation https://github.com/shchur/ifl-tpp#mistakes-in-the-old-version

shchur commented 2 years ago

You can find the original code used for experiments in the paper here https://github.com/shchur/ifl-tpp/tree/original-code

iLampard commented 1 year ago

Hi, no worries, here is the explanation https://github.com/shchur/ifl-tpp#mistakes-in-the-old-version

Hi, can you kindly explain a bit more why we can not divide NLL by the real number of events in the sequence? In a paper you said this number should not depend on the sequence.

"In the old code we used to normalize the NLL of each sequence by the number of events --- this was incorrect. When computing NLL for multiple TPP sequences, we are only allowed to divide the NLL by the same number for each sequence."

Thanks in advance.

shchur commented 1 year ago

As a simple example, consider a homogenous Poisson process with rate $\lambda$ on an interval $[0, T]$. Suppose we have observed two sequences generates by this TPP - first containing $N_1$ events and the second containing $N_2$ events - and want to estimate the parameter $\lambda$ using MLE.

Without normalization, the likelihood is $N_1\log \lambda - \lambda T +N_2\log \lambda - \lambda T = (N_1 + N_2) \log \lambda - 2 \lambda T$.

If we normalize by the LL by $T$ we get $N_1 / T \log \lambda - \lambda +N_2/T \log \lambda - \lambda = (N_1 + N_2) /T \log \lambda - 2 \lambda$. This is proportional to the unnormalized log-likelihood, so we get the same MLE of $\lambda$.

If, however, we normalize the NLL for each sequence by the # of events, we get a different LL function, and end up with the wrong MLE estimate, as you can verify yourself $\log \lambda - T/N_1 \lambda + \log \lambda - T/N_2 \lambda = 2 \log \lambda - (T/N_1 + T/N_2) \lambda$

This small example demonstrates that normalizing by # of events leads to incorrect estimation of the TPP parameters.

iLampard commented 1 year ago

As a simple example, consider a homogenous Poisson process with rate λ on an interval [0,T]. Suppose we have observed two sequences generates by this TPP - first containing N1 events and the second containing N2 events - and want to estimate the parameter λ using MLE.

Without normalization, the likelihood is N1log⁡λ−λT+N2log⁡λ−λT=(N1+N2)log⁡λ−2λT.

If we normalize by the LL by T we get N1/Tlog⁡λ−λ+N2/Tlog⁡λ−λ=(N1+N2)/Tlog⁡λ−2λ. This is proportional to the unnormalized log-likelihood, so we get the same MLE of λ.

If, however, we normalize the NLL for each sequence by the # of events, we get a different LL function, and end up with the wrong MLE estimate, as you can verify yourself log⁡λ−T/N1λ+log⁡λ−T/N2λ=2log⁡λ−(T/N1+T/N2)λ

This small example demonstrates that normalizing by # of events leads to incorrect estimation of the TPP parameters.

Thanks for your response. I get it.