Open amcox opened 4 years ago
Hi;
I'm glad you're using the package. I dug up the code for the Harville expected ranks that I had written before I discovered the closed form formula. I have pasted it below. (There is a performance hack in there to bail after a certain depth.) You would have to modify the definition of the subprob
to follow the Henery model. As I noted in the documentation I am unlikely to add it to the package because the performance is terrible, but I am tempted to put it in a branch of the package. You could then install from that github branch.
/*
Compute the (approximate) expected rank given probabilities.
Hard to do as it is a big multivariate integration.
We approximate it.
Created: 2018.09.26
Copyright: Steven E. Pav, 2018
Author: Steven E. Pav <steven@gilgamath.com>
Comments: Steven E. Pav
*/
#ifndef __DEF_ERANK__
#define __DEF_ERANK__
#include <math.h>
#include <string.h>
#endif /* __DEF_ERANK__ */
#include <Rcpp.h>
using namespace Rcpp;
// we polish the estimate: make sure the sum of
// ranks equals the sum of 1:length(prob) by
// projecting proportional to the variance of a geometric
// random variable on each element. this is two hacks, not one.
NumericVector polish_it(NumericVector prob,NumericVector erer) {
int nel = prob.length();
double tgt = nel * (nel + 1.0) / 2.0;
double curv = sum(erer);
NumericVector vars(nel);
for (int iii=0;iii < nel;iii++) { vars[iii] = (1.0 - prob[iii]) / (prob[iii] * prob[iii]); }
double denom = sum(vars);
double lambda = (tgt - curv) / denom;
NumericVector out(nel);
for (int jjj=0;jjj < nel;jjj++) {
out[jjj] = erer[jjj] + (lambda * vars[jjj]);
}
return out;
}
// basically the geometric distribution dumb approximation
NumericVector dumb_approx(NumericVector prob) {
int nel = prob.length();
NumericVector out(nel);
for (int iii=0;iii < nel;iii++) {
out[iii] = (1.0 - pow((1.0 - prob[iii]),nel))/prob[iii];
}
return out;
}
//' @title
//' Expected (approximate) rank.
//'
//' @description
//'
//' Compute the expected rank of a bunch of entries based on their probability
//' of winning.
//'
//' @details
//'
//' Given the vector \eqn{prob}, we compute the expected rank of each
//' entry, under the assumption that probabilities of winning
//' remain proportional. This is a hard problem, and we approximate
//' to a given depth.
//'
//' @param prob a vector giving the probabilities. Should sum to one.
//' @param max_depth the maximum depth to compute, after which we
//' perform an approximation. A higher number gives a more accurate
//' computation, but is slower.
//'
//' we polish the estimate: make sure the sum of
//' ranks equals the sum of 1:length(prob) by
//' projecting proportional to the variance of a geometric
//' random variable on each element. this is two hacks, not one.
//'
//' @return The approximate expected ranks, a vector.
//' @examples
//' # a garbage example
//' set.seed(12345)
//' probs <- runif(12)
//' probs <- probs / sum(probs)
//' erank(probs,6)
//' # see the effect of depth
//' erank(probs,7)
//' erank(probs,8)
//'
//' @template etc
//' @rdname erank
//' @export
// [[Rcpp::export]]
NumericVector erank(NumericVector prob,
int max_depth=6) {
int nel = prob.length();
if (max_depth > nel) { max_depth = nel; }
if (max_depth < 0) { stop("need positive max_depth"); }
if (max_depth == 1) { return polish_it(prob,dumb_approx(prob)); }
NumericVector out(nel);
NumericVector subprob(nel-1);
NumericVector tailcall(nel);
for (int iii=0;iii < nel;iii++) {
out[iii] += prob[iii];
for (int jjj=0;jjj < nel;jjj++) {
if (jjj < iii) {
subprob[jjj] = prob[jjj] / (1.0 - prob[iii]);
} else if (jjj > iii) {
subprob[jjj-1] = prob[jjj] / (1.0 - prob[iii]);
}
}
tailcall = erank(subprob,max_depth-1);
for (int jjj=0;jjj < nel;jjj++) {
if (jjj < iii) {
out[jjj] += prob[iii] * (1.0 + tailcall[jjj]);
} else if (jjj > iii) {
out[jjj] += prob[iii] * (1.0 + tailcall[jjj-1]);
}
}
}
return polish_it(prob,out);
}
//for vim modeline: (do not edit)
// vim:ts=2:sw=2:tw=79:fdm=marker:fmr=FOLDUP,UNFOLD:cms=//%s:tags=.c_tags;:syn=cpp:ft=cpp:mps+=<\:>:ai:si:cin:nu:fo=croql:cino=p0t0c5(0:
You mentioned in the documentation that, while computationally intensive, it might be possible to add the ability to predict expected rank under the Henry model. I would find that quite useful. Thank you for your consideration and very helpful package.