philmikejones / rakeR

Tools for easy spatial microsimulation (raking) in R
http://philmikejones.github.io/rakeR/
11 stars 1 forks source link

Add repopulate() function? #30

Open philmikejones opened 8 years ago

philmikejones commented 8 years ago

Sometimes the zone populations for different constraint tables don't match. For example, the population in zone a might be 4 for one variable, but the population in zone b might be 5 in a different variable.

This is common with census tables where anonymisation means that some people might be 'swapped'.

More often than not multiple variables match and one or two do not, so it's obvious which population is correct. The incorrect populations are recalculated (imputed) from the actual population.

Should I be creating a repopulate() function to handle this?

philmikejones commented 8 years ago

An early attempt at a function:

repopulate <- function(df, actual_pop) {

  browser()

  df[["pop"]] <- rowSums(df[, 2:ncol(df)]) - df[[actual_pop]]

  df[["diff"]] <- df[[actual_pop]] - df[["pop"]]

  grep("^actual_pop")

df[df[["diff"]] != 0, 2] <- round((df[df[["diff"]] != 0, 2] /
    df[df[["diff"]] != 0, "pop"]) *
    df[df[["diff"]], actual_pop])

}

repopulate(census_car, "age_pop") %>% View()