Open philmikejones opened 8 years ago
An early attempt at a function:
repopulate <- function(df, actual_pop) {
browser()
df[["pop"]] <- rowSums(df[, 2:ncol(df)]) - df[[actual_pop]]
df[["diff"]] <- df[[actual_pop]] - df[["pop"]]
grep("^actual_pop")
df[df[["diff"]] != 0, 2] <- round((df[df[["diff"]] != 0, 2] /
df[df[["diff"]] != 0, "pop"]) *
df[df[["diff"]], actual_pop])
}
repopulate(census_car, "age_pop") %>% View()
Sometimes the zone populations for different constraint tables don't match. For example, the population in zone a might be 4 for one variable, but the population in zone b might be 5 in a different variable.
This is common with census tables where anonymisation means that some people might be 'swapped'.
More often than not multiple variables match and one or two do not, so it's obvious which population is correct. The incorrect populations are recalculated (imputed) from the actual population.
Should I be creating a
repopulate()
function to handle this?