shabbychef / ohenery

Modeling of Ordinal Random Variables via Softmax
GNU Lesser General Public License v3.0
6 stars 0 forks source link

Add ability to predict expected rank under Henry model #2

Open amcox opened 4 years ago

amcox commented 4 years ago

You mentioned in the documentation that, while computationally intensive, it might be possible to add the ability to predict expected rank under the Henry model. I would find that quite useful. Thank you for your consideration and very helpful package.

shabbychef commented 4 years ago

Hi;

I'm glad you're using the package. I dug up the code for the Harville expected ranks that I had written before I discovered the closed form formula. I have pasted it below. (There is a performance hack in there to bail after a certain depth.) You would have to modify the definition of the subprob to follow the Henery model. As I noted in the documentation I am unlikely to add it to the package because the performance is terrible, but I am tempted to put it in a branch of the package. You could then install from that github branch.


/*

    Compute the (approximate) expected rank given probabilities.
    Hard to do as it is a big multivariate integration.
    We approximate it.

  Created: 2018.09.26
  Copyright: Steven E. Pav, 2018
  Author: Steven E. Pav <steven@gilgamath.com>
  Comments: Steven E. Pav
*/

#ifndef __DEF_ERANK__
#define __DEF_ERANK__

#include <math.h>
#include <string.h>

#endif /* __DEF_ERANK__ */

#include <Rcpp.h>
using namespace Rcpp;

// we polish the estimate: make sure the sum of
// ranks equals the sum of 1:length(prob) by
// projecting proportional to the variance of a geometric
// random variable on each element. this is two hacks, not one.
NumericVector polish_it(NumericVector prob,NumericVector erer) {
    int nel = prob.length();
    double tgt = nel * (nel + 1.0) / 2.0;
    double curv = sum(erer);
    NumericVector vars(nel);
    for (int iii=0;iii < nel;iii++) { vars[iii] = (1.0 - prob[iii]) / (prob[iii] * prob[iii]); }
    double denom = sum(vars);
    double lambda = (tgt - curv) / denom;
    NumericVector out(nel);
    for (int jjj=0;jjj < nel;jjj++) { 
        out[jjj] = erer[jjj] + (lambda * vars[jjj]);
    }
    return out;
}

// basically the geometric distribution dumb approximation
NumericVector dumb_approx(NumericVector prob) {
    int nel = prob.length();
    NumericVector out(nel);
    for (int iii=0;iii < nel;iii++) {
        out[iii] = (1.0 - pow((1.0 -  prob[iii]),nel))/prob[iii];
    }
    return out;
}

//' @title
//' Expected (approximate) rank.
//'
//' @description
//'
//' Compute the expected rank of a bunch of entries based on their probability
//' of winning.
//'
//' @details
//'
//' Given the vector \eqn{prob}, we compute the expected rank of each
//' entry, under the assumption that probabilities of winning
//' remain proportional.  This is a hard problem, and we approximate
//' to a given depth.
//' 
//' @param prob a vector giving the probabilities. Should sum to one.
//' @param max_depth  the maximum depth to compute, after which we
//' perform an approximation. A higher number gives a more accurate
//' computation, but is slower.
//'
//' we polish the estimate: make sure the sum of
//' ranks equals the sum of 1:length(prob) by
//' projecting proportional to the variance of a geometric
//' random variable on each element. this is two hacks, not one.
//'
//' @return The approximate expected ranks, a vector.
//' @examples
//' # a garbage example
//' set.seed(12345)
//' probs <- runif(12)
//' probs <- probs / sum(probs)
//' erank(probs,6)
//' # see the effect of depth
//' erank(probs,7)
//' erank(probs,8)
//'
//' @template etc
//' @rdname erank
//' @export
// [[Rcpp::export]]
NumericVector erank(NumericVector prob,
                                        int max_depth=6) {
    int nel = prob.length();
    if (max_depth > nel) { max_depth = nel; }
    if (max_depth < 0) { stop("need positive max_depth"); }
    if (max_depth == 1) { return polish_it(prob,dumb_approx(prob)); }

    NumericVector out(nel);
    NumericVector subprob(nel-1);
    NumericVector tailcall(nel);
    for (int iii=0;iii < nel;iii++) {
        out[iii] += prob[iii];
        for (int jjj=0;jjj < nel;jjj++) {
            if (jjj < iii) {
                subprob[jjj] = prob[jjj] / (1.0 - prob[iii]);
            } else if (jjj > iii) {
                subprob[jjj-1] = prob[jjj] / (1.0 - prob[iii]);
            }
        }
        tailcall = erank(subprob,max_depth-1);
        for (int jjj=0;jjj < nel;jjj++) {
            if (jjj < iii) {
                out[jjj] += prob[iii] * (1.0 + tailcall[jjj]);
            } else if (jjj > iii) {
                out[jjj] += prob[iii] * (1.0 + tailcall[jjj-1]);
            }
        }
    }
    return polish_it(prob,out);
}

//for vim modeline: (do not edit)
// vim:ts=2:sw=2:tw=79:fdm=marker:fmr=FOLDUP,UNFOLD:cms=//%s:tags=.c_tags;:syn=cpp:ft=cpp:mps+=<\:>:ai:si:cin:nu:fo=croql:cino=p0t0c5(0: