varepsilon / clickmodels

ClickModels is a small set of Python scripts for the user click models initially developed at Yandex. A Click Model is a probabilistic graphical model used to predict search engine click data from past observations. This project is aimed to deal with click models used in Information Retrieval (see next README.md) and intended to be easy-to-read and easy-to-modify. If it's not, please let me know how to improve it :)
BSD 3-Clause "New" or "Revised" License
238 stars 71 forks source link

Problem About Parameter Estimation of UBM #9

Open jimmy-walker opened 5 years ago

jimmy-walker commented 5 years ago

Thanks for sharing the code.

When I tested the UBM model, I encountered a question. We all know that UBM uses the EM algorithm to estimate the parameter. For example, for a specific query q and item u, we calculate the attractiveness parameter Aqu. Within the code, we will calculate the numerator and denominator of Aqu separately (An, Ad) and then combine to calculate Aqu = An/ Ad. The problem is that if the first item u1 has only a few click behaviors, its attractiveness parameter may surpass the second item u2 who has many click behaviors. For example : the first item u1: An1=Ad1=100. the second item u2: An2=9000, Ad2=10000 The result of Aqu1 will be bigger than Aqu2.

But I don't think it's normal. Cause the second item get more An. Do you face the similar question during UBM model? Looking forward to any reply. Thanks.

varepsilon commented 5 years ago

Thank you for your question, and sorry for a delayed reply.

You are right, in theory. In practice, though, the initial "priors" that we add to numerator and denominator help to smooth things out. Also note that the model will never give 100% values for attractiveness, but it may still work well in practice.