stevecondylios / priceR

Economics and Pricing in R
https://stevecondylios.github.io/priceR/
Other
59 stars 7 forks source link

adjust_for_inflation() takes a long time for large vector inputs #39

Open stevecondylios opened 2 years ago

stevecondylios commented 2 years ago

This example (with 10 rows) runs very quickly:

set.seed(123)
nominal_prices <- rnorm(10, mean=10, sd=3)
years <- round(rnorm(10, mean=2006, sd=5))
df <- data.frame(years, nominal_prices)

df$in_2008_dollars <- adjust_for_inflation(nominal_prices, years, "US", to_date = 2008)

However, when the same is attempted with 10000 rows, it takes a very long time:

set.seed(123)
nominal_prices <- rnorm(10000, mean=10, sd=3)
years <- round(rnorm(10000, mean=2006, sd=5))
df <- data.frame(years, nominal_prices)

df$in_2008_dollars <- adjust_for_inflation(nominal_prices, years, "US", to_date = 2008)

And it is not clear why. At the least, the user should receive a message giving some expectation of runtime. But ideally, if possible, it should be refactored to be more performant.

stevecondylios commented 2 years ago

adjust_for_inflation() is slow because it has to do a fair bit of work (look up rows in the inflation_dataframe and multiply them, for each set of vector inputs.

Here are some approximate times:

start_time <- Sys.time()
number_of_rows <- 10000

nominal_prices <- rnorm(number_of_rows, mean=10, sd=3)
years <- round(rnorm(number_of_rows, mean=2006, sd=5))
df <- data.frame(years, nominal_prices)

df$in_2008_dollars <- adjust_for_inflation(nominal_prices, years, "US", to_date = 2008)

end_time <- Sys.time()
end_time - start_time

# 100 6.2 seconds
# 200 10.46 seconds
# 1000 36 seconds
# 2000 1.1 minutes
# 10000 6 minutes

adjust_for_inflation() can be made to go twice as fast if extrapolation isn't required. E.g.

country <- "US"
inflation_dataframe <- retrieve_inflation_data(country)

inflation_dataframe

fast_inflate <- function(price, from, to) {

    make_multiplier <- function(from_input, to_input) {

    inflation_dataframe %>%
      filter(date > from_input & date <= to_input | date < from_input & date >= to_input ) %>%
      .$value %>% {. / 100} %>% {. + 1} %>% { ifelse(from_input < to_input, prod(.), { 1 / prod(.) }) }
  }

  multipliers <- mapply(make_multiplier, from_input = from, to_input = to)

  real_price <- price * multipliers

  real_price
}

# Gives same results but in ~3.25 seconds - about half the time
library(tictoc)
tic()
fast_inflate(df$nominal_prices, df$years, 2008)
toc()