Closed lzhlynn closed 6 years ago
Thanks @lzhlynn - The short answer is we don't perform feature normalization. I agree we would have to pass some kind of normalization hints on how we expected the features to be normalized during model evaluation (such as M is the mean, with S as std deviation...).
Which model type are you using? Generally I haven't seen many advantages to normalization in the tree-based models
Thanks a lot @softwaredoug for the reply! I have tested all 8 models provided in Ranklib, so some are tree-based (such as lambdamart), some are not(such as linear regression).
-- "Generally I haven't seen many advantages to normalization in the tree-based models"
I have also compared the performance with and without normalization, and there is but not a big difference in the final evaluation metric. The numbers are still comparable.
Another reason why I raise this question is that: there are other parameters, such as gmax. Although they are more about feature preprocessing or metric calculation and have nothing to do with the model itself, the current model evaluation are processing features and metric in a different away (compare to the training step).
Do you have any suggestions on this case? Thanks a lot!
The current model evaluation indeed expects the features to be "untouched" between when the features are retrieved for model training and when the model is trained. Normalization could certainly be added, but so far it has been deemed unnecessary. Conceptually a single step of LambdaMART training takes all values for a feature and puts them on an ordered line, and then splits the line into two pieces. Because it is only taking a line and splitting it into two pieces applying normalization makes no difference to the set of observations that fall on each side of the split. For this reason my suggestion in this case is to not apply any normalization.
My experience so far has been that while RankLib includes many different implementations, LambdaRank always outperforms the others. If other algorithms do well we could conisder adding additional support, such as for feature normalization, to allow them to work.
Thanks for the question @lzhlynn - don't hesitate to ping us on http://o19s.com/slack to discuss any additional support questions
Hello guys, I am currently using this plugin for my project. I have a question about the normalization which has bothered me for quite a while.
As shown in your demo that once a model has been trained using Ranklib, it can then be loaded it to elastisearch to rank documents for a given query later on. But the Ranklib allows to indicate many parameters, such as –norm zscore for the features while training, but such normalization parameters are not saved in the model. So after it is uploaded to Elasticsearch, how does the elasticsearch know whether to normalize or what normalization method should use when ranking docs for a new query?
Thanks a lot!