insightsengineering / tern

Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials
https://insightsengineering.github.io/tern/
Other
77 stars 21 forks source link

Solution for data and library loading in `testthat` #703

Closed Melkiades closed 2 years ago

Melkiades commented 2 years ago

At the moment data from scda is loaded multiple times (~9.9Mb of data each time) and libraries that are only suggested by tern are loaded multiple times at the beginning of each long test file. I strongly suggest following this post and the issue I filed in rtables#393. In other words, I think the best way to go here is to update the setup file in testthat/ with the libraries needed across tests and add a helper file with data loading (always in testthat/) so they are both done once before tests are run. This would cut down test time and useless complexity and repetition across tests. Also, we need to consider if we should be using withr, but this is related to a later stage imo, when we have a more reasonable data caching design (as discussed here). A tentative list of things to do here:

What do you think @cicdguy @shajoezhu ?

cicdguy commented 2 years ago

Based on the blog you mentioned,

I think these are novel approaches to prevent repetitive data loading and therefore improve test performance. So tests/testthat/setup-data.R would look something like this?

library(scda)
library(scda.2022)
test_data <- scda::synthetic_cdisc_data("rcd_2022_02_28")
Melkiades commented 2 years ago

Exactly. I also changed testthat.R in Rtables to reflect the "do-not-touch" policy. I think adding setup-libraries.R and setup-data.R is a possible nice solution to repetitions and too many mods to be made.

nikolas-burkoff commented 2 years ago

Need to be careful though as test_data will be shared by all tests so if something changes it you could be in a world of pain.

You may want to have


load_data <- function(date) {
  scda::synthetic_cdisc_data(date) 
}

memoise_load_data <- memoise::memoise(load_test_data)