shabbychef / ohenery

Modeling of Ordinal Random Variables via Softmax
GNU Lesser General Public License v3.0
6 stars 0 forks source link

Add predictions of partial orders #5

Open BenoitLondon opened 9 months ago

BenoitLondon commented 9 months ago

Would be handy to be able to predict the place/show and other partial rankings probabilities like top 2/3 in order/disorder, matchups for horse racing betting from the model output and a new dataset

shabbychef commented 9 months ago

While not well documented, you want the erank option for predict. Here is example usage:

# fit model based on pre-1990 data 
library(ohenery)
data(best_picture)
best_picture %<>%
  mutate(place=ifelse(winner,1,2)) %>%
  mutate(weight=ifelse(winner,1,0)) %>%
  mutate(in_sample=year < 1990) %>%
  arrange(year,place)

train_data <- best_picture %>%
  filter(in_sample)
test_data <- best_picture %>%
  filter(!in_sample)

fmla <- place ~ nominated_for_BestDirector + nominated_for_BestActor + nominated_for_BestActress + nominated_for_BestFilmEditing + Drama + Romance + Comedy

# fit in sample
mod <- harsm(fmla,data=train_data,group=year,weights=weight) 
# predict out of sample
expected_place <- predict(mod,test_data,type='erank',group=year)

# plot them.
library(ggplot2)
test_data %>%
  mutate(ep=expected_place) %>%
  ggplot(aes(ep,place)) +
  geom_point()
shabbychef commented 9 months ago

The expected ranks are exactly that: expected values. If you want to convert them to integers, you can do that with data wrangling. That is, group by the race then call rank. Note that rank has to guess how to deal with ties. Continuing the example above:

test_data %>%
  mutate(ep=expected_place) %>%
  group_by(year) %>%
    mutate(int_place=rank(ep)) %>%
  ungroup() 

The int_place here are not all integers, as there is a tie observed in the forecasts. You will have to decide what to do in case of a tie.

shabbychef commented 9 months ago

I think this is already implemented, but feel free to reopen.

shabbychef commented 9 months ago

Sorry, I forgot there is a wrinkle with expected ranks under the Henery model. The code will spit out somewhat nonsensical numbers for expected ranks when fit with hensm. (The above example with harsm is fine.) Follow that thread at issue #2 .

BenoitLondon commented 9 months ago

Hi! thanks I saw erank but AFAIU expected rank doesn't give the probability to finish at each rank. say I want the probability to finish in top2?

shabbychef commented 9 months ago

Ah, I see. Currently you can get back the probability of first place via:

hmod <- hensm(fmla,data=train_data,group=year,weights=weight) 
eprob <- predict(hmod,test_data,type='mu',group=year)

But that is not helpful for place or show bets. Reopening.