thunlp / EntityDuetNeuralRanking

Entity-Duet Neural Ranking Model
MIT License
153 stars 20 forks source link

Freezing embedding layer #4

Closed pommedeterresautee closed 4 years ago

pommedeterresautee commented 6 years ago

I am running KRNM and CKNRM on our own log. I learned the embeddings on my own dataset (the real documents) with fasttext.

As in your experiences (and other papers), I get a nice boost from CKNRM compared to KRNM. However, the result is approximatively the same when the first layer is frozen. Moreover, the max perf is reached on dev very rapidly (during the first epoch) and then it stays approximatively the same for several epochs (when embedding layer is frozen or not).

image Dev in blue, test in red

Because of large bias in our logs, I needed to limit them to 100K queries (and there are 20 docs per SEPR).

Have you noticed the same behaviour on your log datasets? (rapid reach of max + no effect when freezing first layer). Have you tried to add more regularization? (I tried some 50% dropout on embedding layers without real effect)

EdwardZH commented 6 years ago

I think you are right. We use pre-trained word embedding in Conv-KNRM to get the same result on Sogou query log and it is not so stable as K-NRM. I think it may be not very work in lots of scenarios and Conv-KNRM may learn more popularity information. So I think if you utilize entity will improve your performance.

pommedeterresautee commented 6 years ago

I have also noticed the instability of Conv-KNRM on my dataset.

I kind of "fixed" it by offsetting the bias in my logs towards first results (... by removing the 3 first results... I know not ideal but very effective). Before, CNN perf were very low and unstable, after I always reach high perf.

My feeling is that CNN adds more capacity to capt everything and therefore are very sensible to bias.

Regarding using entity, unfortunately, in my case, I can't use a knowledge base because in my industry there is none which covers our client queries. And reading entity duet paper I was thinking, ok it s just slightly better than CNN but requires a Knowledge base and more computation, so CNN was the obvious choice.

EdwardZH commented 6 years ago

Thank you for your comment. I do not really understand "I kind of "fixed" it by offsetting the bias in my logs towards the first results", and can you describe it more specifically. A short paper in Sigir also demonstrates that training K-NRM more times and incorporate them all can reach better performance. I think you are right and Conv-KNRM amplifies the bias.

pommedeterresautee commented 6 years ago

image Above is a count of click per position on our logs for 1 of our search engine. Obviously #1st position is much more clicked than second and so on. The interesting point is that in our case... our search engine just performs boolean search without any traditional relevancy like BM25 or anything else (but few rules to unboost some kind of contents we don't think most of our users are interested into).

My point is that in this specific scenario, I have only position bias. With raw clicks, DL model were performing very poorly (under the position bias). So I removed from our logs the 3 first results and the model performed better. Since then, I found something else which seems intellectually better and make the model perform even better: I only keep search where the user has clicked minimum one time on the second part of the SERP. Because there is no relevancy on our search engine, these queries are specially harder than the other, just the user has looked at the minimum at 50% of the results shown. Not perfect but the model is performing 20 absolute points of MAP better than without doing anything, so I imagine the signal is of better quality after these filters.

EdwardZH commented 6 years ago

In Sogou query log, all the click rates almost have not so much bias. So I think the max-margin loss function can be rewritten in your task, maybe weight each margin ore just use sigmoid function. This idea I have not noticed before, thank you very much! There are lots of works we have to complete.

EdwardZH commented 4 years ago

Hi, for more neural IR training, data augmentation and more about EDRM. Please refer to our WWW2020 Paper Selective Weak Supervision for Neural Information Retrieval. Thank you for your attention. I will close this issue.