Closed davidlight2018 closed 2 years ago
This will give all leaf elements the same surprise independently of their usefulness so it would not be very useful (`df['surprise'] will have the same value for all rows). If you take a look at the original Adtributor paper[1], the surprise for an element is computed as:
For the code implementation, in adtributor
the sum over leaf elements is done within the for loop to obtain the total surprise for an element, i.e. here:
for d in dimensions:
elements = df.groupby(d).sum()
elements = elements.sort_values('surprise', ascending=False)
...
For adtributor_new
the sum is done at the beginning. There will therefore only be 1 element considered at the time and there is no need to sum (as it already has been done).
[1]: Bhagwan, Ranjita, et al., Adtributor: Revenue debugging in advertising systems." 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), 2014.
thx, very helpful !
No problems :)
The calculation of surprise value in adtributor seems not correct to me.
The JS divergense formula should be:
So, the code should be:
what do you think? thanks.