Open stevecondylios opened 2 years ago
adjust_for_inflation()
is slow because it has to do a fair bit of work (look up rows in the inflation_dataframe
and multiply them, for each set of vector inputs.
Here are some approximate times:
start_time <- Sys.time()
number_of_rows <- 10000
nominal_prices <- rnorm(number_of_rows, mean=10, sd=3)
years <- round(rnorm(number_of_rows, mean=2006, sd=5))
df <- data.frame(years, nominal_prices)
df$in_2008_dollars <- adjust_for_inflation(nominal_prices, years, "US", to_date = 2008)
end_time <- Sys.time()
end_time - start_time
# 100 6.2 seconds
# 200 10.46 seconds
# 1000 36 seconds
# 2000 1.1 minutes
# 10000 6 minutes
adjust_for_inflation()
can be made to go twice as fast if extrapolation isn't required. E.g.
country <- "US"
inflation_dataframe <- retrieve_inflation_data(country)
inflation_dataframe
fast_inflate <- function(price, from, to) {
make_multiplier <- function(from_input, to_input) {
inflation_dataframe %>%
filter(date > from_input & date <= to_input | date < from_input & date >= to_input ) %>%
.$value %>% {. / 100} %>% {. + 1} %>% { ifelse(from_input < to_input, prod(.), { 1 / prod(.) }) }
}
multipliers <- mapply(make_multiplier, from_input = from, to_input = to)
real_price <- price * multipliers
real_price
}
# Gives same results but in ~3.25 seconds - about half the time
library(tictoc)
tic()
fast_inflate(df$nominal_prices, df$years, 2008)
toc()
This example (with 10 rows) runs very quickly:
However, when the same is attempted with 10000 rows, it takes a very long time:
And it is not clear why. At the least, the user should receive a message giving some expectation of runtime. But ideally, if possible, it should be refactored to be more performant.