pierreandrieu / corankco

https://pypi.org/project/corankco/
GNU General Public License v2.0
4 stars 3 forks source link

weighted consensus #1

Open nikozoe opened 1 year ago

nikozoe commented 1 year ago

great package, thank you! I was wondering if it would be possible to weigh the rankings that enter the dataset which becomes the basis for the consensus ranking. Could you point me to where in the code I would have to make changes to achieve that? The use case would for example be to aggregate rankings given by experts and weigh them according to their expertise (which might be based on confidence or prior performance).

pierreandrieu commented 1 year ago

great package, thank you! I was wondering if it would be possible to weigh the rankings that enter the dataset which becomes the basis for the consensus ranking. Could you point me to where in the code I would have to make changes to achieve that? The use case would for example be to aggregate rankings given by experts and weigh them according to their expertise (which might be based on confidence or prior performance).

Hello ! Thank you for your feedback about the package ! About the weights, it may be possible to consider weights but could you please formalize the concept ? For example, will the weights be integers or real values ? If they are integers, is it the same to have a) c1 with weight 2 + c2 with weight 4 and b) twice c1 + 4 times c2 ? If they can be real values like 0.3, it may become necessary to redefine each rank aggregation algorithm ... otherwise, it will be confusing !

nikozoe commented 1 year ago

Thanks for your answer. The weights would be real values. I see, if the weights were integers one could just include a ranking with for example weight=2 twice in the data set. Unfortunately this is not possible with real values. I guess what I would like is to weight the individual distances to the rankings in the Kemeny score that is minimized to find the concensus ranking, i.e. $S(c,R,w) = \sum_{r \in R} w_r K(c,r)$.

pierreandrieu commented 1 year ago

Thanks for your answer. The weights would be real values. I see, if the weights were integers one could just include a ranking with for example weight=2 twice in the data set. Unfortunately this is not possible with real values. I guess what I would like is to weight the individual distances to the rankings in the Kemeny score that is minimized to find the concensus ranking, i.e. S(c,R,w)=∑r∈RwrK(c,r).

Ok ! Thank you for your clarification. To address your requirement, the addition of a substantial amount of code isn't necessary, yet many classes will need minor adjustments. Be mindful that this may result in numerous unintended consequences. I propose the following alternative, contingent upon the urgency of your requirement: You could modify the code by yourself, in which case I can provide specific guidance on the code sections that need adjustments (with the risk that I may not anticipate all the the unintended consequences), or you could wait for a month, and I'll roll out a new version that will include the weighting feature. I'm on holidays for the moment.

nikozoe commented 1 year ago

Hi @pierreandrieu, it would be great if you could eventually integrate this functionality. Since I only needed the ExactAlgorithmPulp algorithm I gave it a try. I mainly modified the _pairwise_cost_matrixgeneric method in the PairwiseBasedAlgorithm class. The main bit was:

                if weights is not None:
                    # count frequencies weighted by ranking importance
                    el1_l_el2 = (mem < positions[el_2]) 
                    el1_in_r = (mem != -1)
                    el2_in_r = (positions[el_2] != -1)
                    el2_l_el1 = (positions[el_2] < mem )
                    # x < y, x > y, x and y are tied, x is the only ranked, y is the only ranked, x and y are non-ranked
                    relative_positions = array([
                        vdot(weights, el1_l_el2 & el1_in_r),
                        vdot(weights, el2_l_el1 & el2_in_r),
                        vdot(weights, ~(el1_l_el2 | el1_in_r) & el1_in_r & el2_in_r),
                        vdot(weights, el1_in_r & ~el2_in_r),
                        vdot(weights, el2_in_r & ~el1_in_r),
                        vdot(weights, ~(el1_in_r | el2_in_r))
                    ])

and what needed to be changed to parse the weights parameter. It's probably not the most elegant solution and I haven't tested it much but it seems to do what I want for now.