Open BenoitLondon opened 1 year ago
If I remember correctly, the weights input to harsm
and hensm
can be used as importance weights as well. The only caveat is that the weight for the last place participant in a group has no bearing on the outcome. At the very least, the following code runs:
library(ohenery)
data(best_picture)
best_picture %<>%
mutate(place=ifelse(winner,1,2)) %>%
mutate(weight=ifelse(winner,1,0)) %>%
mutate(down_weight1=weight * ifelse(year < 1960,0.5,1)) %>%
mutate(down_weight2=weight * ifelse(year < 1960,0,1))
fmla <- place ~ nominated_for_BestDirector + nominated_for_BestActor + nominated_for_BestActress + nominated_for_BestFilmEditing + Drama + Romance + Comedy
mod0 <- harsm(fmla,data=best_picture,group=year,weights=weight)
mod1 <- harsm(fmla,data=best_picture,group=year,weights=down_weight1)
mod2 <- harsm(fmla,data=best_picture,group=year,weights=down_weight2)
(Checking this for semantic correctness...)
At the very least, using zero weights for pre-1960 Oscar awards gives the same results as not including that data in the fit:
# check if 0 weights are akin to missing the whole group
library(ohenery)
data(best_picture)
best_picture %<>%
mutate(place=ifelse(winner,1,2)) %>%
mutate(weight=ifelse(winner,1,0)) %>%
mutate(cutoff=ifelse(year < 1960,0,1)) %>%
mutate(down_weight=weight * cutoff)
fmla <- place ~ nominated_for_BestDirector + nominated_for_BestActor + nominated_for_BestActress + nominated_for_BestFilmEditing + Drama + Romance + Comedy
# include the data but zero weight.
mod1 <- harsm(fmla,data=best_picture,group=year,weights=down_weight)
# do not include the data.
mod2 <- harsm(fmla,data=best_picture %>% filter(cutoff > 0),group=year,weights=weight)
print(mod1)
print(mod2)
I get the same summaries for the two fits.
Hmm, I am not able to demonstrate that the weights really act as replication weights, although they are close:
# check if weights are really replication weights.
# give weight 1 to pre-1960 and weight 2 to 1960 and forward
# check if that is the same as including the post 1960 data twice.
library(ohenery)
data(best_picture)
best_picture %<>%
mutate(place=ifelse(winner,1,2)) %>%
mutate(weight=ifelse(winner,1,0)) %>%
mutate(multiplier=ifelse(year < 1960,1,2)) %>%
mutate(down_weight=weight * multiplier) %>%
arrange(year,place)
# dupe it out;
bp <- bind_rows(best_picture,
best_picture %>%
filter(multiplier > 1) %>%
mutate(year=year+200)) %>% # get the grouping distinct!
arrange(year,place)
fmla <- place ~ nominated_for_BestDirector + nominated_for_BestActor + nominated_for_BestActress + nominated_for_BestFilmEditing + Drama + Romance + Comedy
# include the data and down weight
mod1 <- harsm(fmla,data=best_picture,group=year,weights=down_weight)
# duplicate the data.
mod2 <- harsm(fmla,data=bp,group=year,weights=weight)
print(mod1)
print(mod2)
These give slightly different results, which is annoying.
Oh ok thank you, I didn't see they could be used like that as well. Maybe difference is due to scaling (if any) ?
For example to discount old races, we could weight the likelihood for a group/race as implemented in lm/glm etc...
(It is different from the "censoring" weights you already have)