Open BenoitLondon opened 9 months ago
While not well documented, you want the erank
option for predict. Here is example usage:
# fit model based on pre-1990 data
library(ohenery)
data(best_picture)
best_picture %<>%
mutate(place=ifelse(winner,1,2)) %>%
mutate(weight=ifelse(winner,1,0)) %>%
mutate(in_sample=year < 1990) %>%
arrange(year,place)
train_data <- best_picture %>%
filter(in_sample)
test_data <- best_picture %>%
filter(!in_sample)
fmla <- place ~ nominated_for_BestDirector + nominated_for_BestActor + nominated_for_BestActress + nominated_for_BestFilmEditing + Drama + Romance + Comedy
# fit in sample
mod <- harsm(fmla,data=train_data,group=year,weights=weight)
# predict out of sample
expected_place <- predict(mod,test_data,type='erank',group=year)
# plot them.
library(ggplot2)
test_data %>%
mutate(ep=expected_place) %>%
ggplot(aes(ep,place)) +
geom_point()
The expected ranks are exactly that: expected values. If you want to convert them to integers, you can do that with data wrangling. That is, group by the race then call rank
. Note that rank
has to guess how to deal with ties. Continuing the example above:
test_data %>%
mutate(ep=expected_place) %>%
group_by(year) %>%
mutate(int_place=rank(ep)) %>%
ungroup()
The int_place
here are not all integers, as there is a tie observed in the forecasts. You will have to decide what to do in case of a tie.
I think this is already implemented, but feel free to reopen.
Sorry, I forgot there is a wrinkle with expected ranks under the Henery model. The code will spit out somewhat nonsensical numbers for expected ranks when fit with hensm
. (The above example with harsm
is fine.) Follow that thread at issue #2 .
Hi! thanks I saw erank but AFAIU expected rank doesn't give the probability to finish at each rank. say I want the probability to finish in top2?
Ah, I see. Currently you can get back the probability of first place via:
hmod <- hensm(fmla,data=train_data,group=year,weights=weight)
eprob <- predict(hmod,test_data,type='mu',group=year)
But that is not helpful for place or show bets. Reopening.
Would be handy to be able to predict the place/show and other partial rankings probabilities like top 2/3 in order/disorder, matchups for horse racing betting from the model output and a new dataset