Open OJWatson opened 1 year ago
Feel free to ping me if anything's unclear. I'll do my best to remember how it all works.
Feel free to ping me if anything's unclear. I'll do my best to remember how it all works.
Thank you! That's very kind. For now i'm doing the same (remembering/learning) how it all works but it's hopefully a small change we are making.
@dolivera23 and @slwu89 - I think I might need some help and may have a simpler solution.
@slwu89 - The code changes that Daniela made, when translate to C++ would want to have infection_process_vaccine_cpp_internal
return sometimes a NumericVector
and sometimes a Rcpp::List
(or similar). However, while writing this it was starting to become quite complicated and I wasn't sure this would be a good approach/advised. (perhaps read on as well for my question to Daniela, which might help more).
@dolivera23 - in the code changes you have made, is there a way to not have make_calculate_nat
return a list sometimes and at other times return a vector? I.e. if we also passed parameters
and day
to make_calculate_nat
, would you be able to rewrite this in a way that then used parameters$vp_time
and an if
statement similar to how you have done in your altered vaccine_efficacy_<>
functions to generate a vector of neutralising antibody titres? If you could, that would be a lot simpler, as I think we would then just have to change infection_process_vaccine_cpp_internal
here: https://github.com/mrc-ide/safir/blob/main/src/process_infection_vaccine.cpp#L57-L82) to have that same if statement.
Does that make some/any sense? Apologies if it doesn't - my Rcpp/Cpp is pretty rusty and out of date.
Hi @OJWatson, oh yes I see. We definitely want that function to return a matrix, with 1 row if we're only returning a vector and 2 rows if we're returning a "list". That would be my suggestion as a way to generalize those two return types. The other approach is to just design the code around the most complicated return object (List) and then for the one that returns a vector, just leave the unused slots of the List uninitialized. But I think the first approach will definitely be faster.
We can get away with some flexibility in C++ without templates but we do need to have "type stable" return values from functions.
Hi @slwu89,
We are returning to this work and I am still struggling to figure out the integration between the modified R code (efficacy_nat.R) and downstream C++ code . Per yours and OJ's suggestion I have updated the code so the functions will return a matrix. I'm still quite uncertain about the next steps required to get this fully integrated into safir's infrastructure.
I suspect this probably doesn't require much in the way of extensive code modifications, but not being super familiar with C++, I can't get my head round of the changes required to the codebase downstream of the R code that I've modified. Is there any chance you can help? Ideally if you'd be willing to make (what I hope are) minor changes to the underlying C++ code that'd be fantastic, but recognising your time is probably quite constrained at the moment, even some pointers or pseudocode on how to approach this would be fantastic (I don't even know where to start at the moment).
We're (unfortunately) working on quite a tight, externally imposed deadline at the moment, and OJ's off this month, so any help you could give I'd be eternally grateful!
Thanks
Daniela
Hi @dolivera23, no problem, I can take a look.
I made a small change to get tests running (still failing, but we can start to work on those later). The only real important one is here: https://github.com/mrc-ide/safir/pull/82/commits/46454512193180ec53fd1019c84a5ddacea12a86#diff-927936f8b8184167f8e396ea30361a14f116ddc7ac4e183f026ceb56796f1dc8, when there is only one type of NAT we want to use the matrix
function to return a 1 row matrix, using as.matrix
returns instead a 1 column matrix.
Is it intended for all the functions in R/efficacy_nat.R to only work with the version of make_calculate_nat
that returns a 2 row matrix? Currently they all assume that there will be 2 rows in the input. If in the future only models with 2 types of NAT are going to be used, it will be easiest to do away with any logic that checks for 1 vs 2 and simplify make_calculate_nat
so that it will always return a matrix of dims 2 x number of people.
Once these questions are answered we can get to modifying the C++ code, and then reimplement the tests in testthat/ to make sure things are internally consistent, at least. We will need to modify this function that makes the ab titre values: https://github.com/mrc-ide/safir/blob/main/src/process_infection_vaccine.cpp#L57-L82 as well as all the function in this file: https://github.com/mrc-ide/safir/blob/main/src/efficacy_nat.cpp which generate efficacy profiles from those ab titre values.
By the way, in the new code in R/efficacy_nat.R, it looks like nt_vec
is used but not defined. Once that is fixed I can take another look and start to think about how the C++ should change.
Can you also explain what parameters$vp_time
is doing? Just to help make sure the C++ code can smoothly follow the new R code.
Thank you @slwu89 for looking into this.
I have updated the code considering your suggestions and trying to simplify the problem. I have eliminated the use of the make_calculate_nat function for the efficacy functions. Before it was useful to have a function that added the ab_titres values but now we actually want them independently and it works best if we read them from the variables argument. This change could only work if:
Do you consider this change would make more sense and would be easier to implement? We do not need to fiddle with data structures.
Regarding the parameters$vp_time, this is a new parameter that will need to be included in the main run function. It is a vector of 0s and 1s, the size is the total number of days simulated. It can be initialized as a vector of 0s. I am not sure where to declare this new parameter and include in the main run function. Some help with this new parameters would be hugely appreciated
Thank you again for all your help with this
@dolivera23 I'll be able to look at the code again later in the week, but in the meantime, to check where things are used, in RStudio I usually like to use "Find in Files" (command/ctrl + shift + f) to search all files for a specific query. You can also search for that query in GitHub (they now have some great features for searching within a repo! https://github.com/search?q=repo%3Amrc-ide%2Fsafir+variables%24ab_titre_inf&type=code)
Hi @dolivera23. I had a look at the C++ code and realized we actually need to keep the function that calculates the NAT for the selected population of individuals separate from the functions that compute the various efficacies, otherwise the structure of the C++ code won't work. I've reformatted the R code in efficacy_nat.R to do that, can you double check for me that it is doing exactly what you want? See here: https://github.com/mrc-ide/safir/blob/fe8be41e5e47fa869fb020b8241024cfb44666e9/R/efficacy_nat.R
Assume that the argument nat
will be the return value from the make_calculate_nat
function. Specifically can you make sure that everything is being done on the correct scale (i.e. linear vs log)? I'm always a bit concerned about getting that wrong. The default null value on the log scale is -Inf
so on the linear scale it should be 0
.
Oh, one more thing, for the sake of simplifying the code by removing options we won't use the in future anyway, is it ok to assume that all future safir model runs will use the version with 2 different types of NAT values? (i.e. infection and vaccine derived). That will help make the code more maintainable if we can remove some of the extra bits that make it work if we are only considering 1 of the 2.
After that I can make corresponding changes in the C++ code. We also will need to write tests to make sure that this is all working as expected.
Hi @slwu89 & @dolivera23
On the scale, ab_titres are I believe are being stored on the natural log scale throughout the code so that it is easy to implement the decay on that scale. Your new calculate_nat function is correct in reading these in, transposing to a linear scale and then adding the infection and vaccines (and the vfr scaling is correct here too). The new functionality for the variant-proof is also correct. Assuming this returns nat (now on a linear scale) that is then read in by the vaccine_efficacy functions then they are correct too.
Agree that we should simplify the code by assuming that we will always run with two NAT from now on.
If you can share any tests that were on this piece of code previously then we can think about how to update them.
Thanks for looking into this @slwu89.
The changes look great for the efficacy functions. In the original code calculate_nat() function used to return the values on the log scale. This function is implemented in other processes but all seem related to the efficacy functions (https://github.com/search?q=repo%3Amrc-ide%2Fsafir+make_calculate_nat&type=code), so the change to linear scale should work well.
Thanks @azraghani and @dolivera23! I'll have a chance to look at the C++ hopefully today or tomorrow.
Tests for efficacy functions are included here https://github.com/mrc-ide/safir/blob/main/tests/testthat/test-efficacy-naturalimmunity.R and here https://github.com/mrc-ide/safir/blob/main/tests/testthat/test-efficacy-vaccination.R
Hi @dolivera23 and @azraghani, I've updated the C++ code to match functionality of the R code and re-implement tests for both them. Note that strictly speaking, the new vp_on
feature is "untested", although the functionality is now there. Please check the commit https://github.com/mrc-ide/safir/pull/82/commits/73ad015a80b691b2459ffe78b0f197e89b085d8e to see the necessary changes that were made (relevant ones to files in R/ and src/).
Also note that I modified the make_vaccine_parameters
function in R/parameters_vaccine.R; the R code as written seemed to assume parameters$vp_time
was a vector of length equal to number of timesteps but the parameter was just a single numeric value. I updated the function to correctly set it to a vector.
Patch coverage is 97.84%
of modified lines.
:exclamation: Current head 18c4e9a differs from pull request most recent head fd1a02a. Consider uploading reports for the commit fd1a02a to get more accurate results
Files Changed | Coverage |
---|---|
R/efficacy_nat.R | 95.45% |
src/efficacy_nat.cpp | 98.50% |
R/distribution_vaccines.R | 100.00% |
R/events_exposure_vaccine.R | 100.00% |
R/parameters_vaccine.R | 100.00% |
R/process_infection_vaccine.R | 100.00% |
src/process_infection_vaccine.cpp | 100.00% |
:loudspeaker: Thoughts on this report? Let us know!.
@slwu89 Thanks for making these changes! The code changes all make sense; we'll run some local tests to ensure this is doing what we expect.
Great! Just tag me if anything strange comes up.
@azraghani did the local tests look ok? If they are looking reasonable I'd be keen to merge this PR into main so it's easier to make further changes to it later on in smaller chunks of work.
Hi @slwu89,
We are still conducting the local tests, will update you later this week, once we are sure it is safe to merge this branch into the main one
Ok!
Hi @slwu89,
Firstly, thanks again for all the work getting these changes in and sorry for not having time previously to help write these in.
Daniela and Azra have done some testing and the code works (runs fine) but we have found it's not implementing the variant proof vaccine as intended. But we can see why from these code lines:
calculate_nat <- function(index, day) {
# Get VFR parameter, if null assign 1
if (is.null(parameters$vfr)) {
vfr <- 1
} else {
vfr <- parameters$vfr[day]
}
nat_vaccine <- exp(variables$ab_titre$get_values(index))
nat_infection <- exp(variables$ab_titre_inf$get_values(index))
vp_on <- parameters$vp_time[day]
if(vp_on == 0 ){ # If there is not a variant proof vaccine
nt <- nat_vaccine + nat_infection
nt <- pmax(.Machine$double.eps, nt / vfr)
} else {
nat_infection <- pmax(.Machine$double.eps, nat_infection / vfr) # Only infection induced antibodies are scaled down by vfr
nt <- nat_vaccine + nat_infection
}
return(nt)
}
In the new calculate_nat
function inside make_calculate_nat
function, we need some changes so the decision to not apply the vfr
to nat_vaccine
also takes into consideration when an individual was vaccinated.
If an individual was last vaccinated when vp_on
was not equal to 0 (i.e. they were vaccinated when the variant proof vaccine was being used), then the vfr
should not be applied to nat_vaccine
.
However, if they were vaccinated when vp_on == 0
, then the vfr should also be applied to nat_vaccine
.
I think this should be easy enough to do as you already have a variable for dose_time
and instead of having the if
statement you could do it something like:
calculate_nat <- function(index, day) {
# Get VFR parameter, if null assign 1
if (is.null(parameters$vfr)) {
vfr <- 1
} else {
vfr <- parameters$vfr[day]
}
nat_vaccine <- exp(variables$ab_titre$get_values(index))
nat_infection <- exp(variables$ab_titre_inf$get_values(index))
# nat_infection should always be scaled by vfr regardless
nat_infection <- pmax(.Machine$double.eps, nat_infection / vfr)
# find individuals who were vaccinated when variant proof vaccine was on
dose_times <- variables$dose_time$get_values(index)
# N.B. Sean - this may not be right if dose_times are different indexed to the day (e.g. might be on off by one error)
non_vp_index <- which(parameters$vp_time[dose_times] == 0)
# apply vfr to those that were vaccinated not during variant proof window
nat_vaccine[non_vp_index] <- pmax(.Machine$double.eps, nat_vaccine[non_vp_index] / vfr)
# and combine these for overall NAT
nt <- nat_vaccine + nat_infection
return(nt)
}
Does this make sense?
Hopefully this is not too big a change to add in (I would have made it myself but wanted to check the dose time would be correctly grabbed in this way and if the day indexing for dose times etc is correct).
Thanks again for all the help
Hey @OJWatson, thanks for checking things so thoroughly! Yeah I think everything should work great with your suggested changes, except for one thing. I checked how does_times
are stored and unfortunately it's storing the timestep that person got a dose rather than the day they got a dose (see https://github.com/mrc-ide/safir/blob/vp_vaccine/R/events_vaccination.R#L40). I guess in hindsight we should have decided to only resolve anything vaccine related to the nearest day rather than timestep. Anyway I'll implement the changes you made plus changes to go from timestep to day in calculate_nat
and push.
[EDIT] might take longer than I thought...I forgot calculate NAT is actually implemented in C++ for the real simulation (https://github.com/mrc-ide/safir/blob/vp_vaccine/src/process_infection_vaccine.cpp#L67). The tests in testthat will check that the R and C++ versions agree.
@OJWatson, I think I got everything working in the R version. I implemented your suggestions in both the R and C++, see here for the R (https://github.com/mrc-ide/safir/blob/vp_vaccine/R/efficacy_nat.R#L78). Please note that I modified it such that the indexing vector been_vaccinated_non_vp
for nat_vaccine
now only updates the NAT for persons who were vaccinated at all and were vaccinated outside the variant proof window. That's the reason for the extra logic involving dose_num
is doing.
I factored out the C++ version of make_calculate_nat
into its own function here and the corresponding C++ code is here: https://github.com/mrc-ide/safir/blob/vp_vaccine/src/efficacy_nat.cpp#L37 It mirrors the R code but does it in a single for loop over all individuals in the input index
bitset, as its much more natural (and fast) to write that way.
There are now 2 tests, one of which is working [EDIT] after https://github.com/mrc-ide/safir/pull/82/commits/fcc9df3dece61afe72212ecf5f0a446d93f18438 both are passing. The first one is here: https://github.com/mrc-ide/safir/blob/vp_vaccine/tests/testthat/test-vfr.R#L478 and checks merely that the NAT calculation is consistent between R and C++ when using vp_time
or not. This one is working fine.
The second is here: https://github.com/mrc-ide/safir/blob/vp_vaccine/tests/testthat/test-vfr.R#L545 and runs a small simulation to make sure it is working properly. It is not passing, and indicates there is some issue with the larger integration into the The R version of that process is here https://github.com/mrc-ide/safir/blob/vp_vaccine/R/process_infection_vaccine.R#L16 and C++ is here https://github.com/mrc-ide/safir/blob/vp_vaccine/R/process_infection_vaccine.R#L16infection_process_vaccine
function that samples S -> E infections, accounting for NAT values.
I'll look at why that's going on as I have time, in the meanwhile if you need to run immediately I'd suggest using the R version infection_process_vaccine
.
[EDIT] update: after latest commit https://github.com/mrc-ide/safir/pull/82/commits/fcc9df3dece61afe72212ecf5f0a446d93f18438 I believe both the R and C++ versions are functioning correctly. In short, I think we are good to go, if you agree with the logic for NAT updating I described in the first paragraph.
Hi Sean, thanks very much this looks amazing.
We are just running some tests now to check it then produces sensible outputs for the scenarios (just in case we haven't thought about something else) and will get back to you.
Thanks again - this is great and the tests and explanation are super helpful
Great! I was uneasy about it still so I included some more extensive integration testing between the R/C++. I think it's working as I understand it (so remaining problems will be in my incorrect implementation, not a bug per se).
@slwu89 Thanks again for the work. We were running through the tests and found some odd things when running it through our sims (when vp_time was in use and vfr was != 1, we found that there was a difference in sims before vp_time started.
So I was checking through the tests and found the last one was not checking R vs C++ so corrected that and it then errored. I found a small bug in the nat calculation in the R side (incorrect indexing of those who have been vaccinated). I've fixed that and the last test was now working locally so I pushed it up.
I don't however think this is the cause of the odd things we were finding though, so I'll keep looking but thought i'd do this one while I found it.
@slwu89 I have added a new test case that I think shows the odd behaviour I was talking about:
https://github.com/mrc-ide/safir/blob/vp_vaccine/tests/testthat/test-vfr.R#L665-L696
We simulate vfr occuring on day 10 and vp vaccines coming in on day 15. And then compare that against a sim with no vp vaccine. At day 11 the two sims depart, whereas we would perhaps expect them to only depart after day 15. I'll keep having a look but hopefully now knowing from the previous commits that the R and C++ versions are the same it should be quick to find what is causing the issue.
While playing with that last test I also noticed that the vaccine allocation had a small bug where it may overallocate doses. The last commit 9075fee fixes this.
Hi @slwu89,
Sorry for possibly confusing set of previous commits/comments. Just shout if they don't make sense but I think we are there now. But the changes in summary are:
https://github.com/mrc-ide/safir/pull/82/commits/aec4c806d96e0a033266ed8b9b1d3b34cd52226b : Correction to vfr test case so that the R and C++ implementations are being compared.
https://github.com/mrc-ide/safir/pull/82/commits/9075feed6ce56392ed20c6535773475bc084cc78: Correction to assign_doses
, where the use of rmultinom
may over allocate.
Remaining commits have corrected the R code in make_calculate_nat
which was not quite right (our fault as we provided this before) and now includes the addition of a test that checks that the R implementation when VP is in place and is not in place are doing the same thing prior to VP being switched on.
I have also had a go at the C++ version of efficacy_nat
and the tests that you have for checking that the C++ and R are doing the same thing are now passing.
The only thing is the lintr tests (can I leave those to you as I don't want to muck the style up and I can't understand why it is complaining about some vertical white space issues when there are white spaces elsewhere in the C++, which I think are great - I like the white space).
Lastly, below is a small demo of why the previous R code was wrong.
There were 2 issues.
First and below is what the original safir
code was doing when calculating NAT (i.e. what is in main
), and then the following function is what we previously provided to you to alter to account for VP vaccines:
> original_nat <- function(nat_vaccine, nat_infection) {
nat_vaccine <- exp(nat_vaccine)
nat_infection <- exp(nat_infection)
nt <- nat_vaccine + nat_infection
nt <- pmax(.Machine$double.eps, nt / vfr)
return(nt)
}
> previous_nat <- function(nat_vaccine, nat_infection, non_vp_indexes) {
nat_vaccine <- exp(nat_vaccine)
nat_infection <- exp(nat_infection)
# nat_infection should always be scaled by vfr regardless
nat_infection <- pmax(.Machine$double.eps, nat_infection / vfr)
# apply vfr to those that were vaccinated not during variant proof window
nat_vaccine[non_vp_indexes] <- pmax(.Machine$double.eps, nat_vaccine[non_vp_indexes] / vfr)
# and combine these for overall NAT
nt <- nat_vaccine + nat_infection
return(nt)
}
> nat_vaccine <- c(-Inf, -1.4105236, 0.8388213)
> nat_infection <- c(-Inf, -1.4105236, 0.8388213)
> vfr <- 10
> original_nat(nat_vaccine, nat_infection)
[1] 2.220446e-16 4.880310e-02 4.627277e-01
> previous_nat(nat_vaccine, nat_infection, 1:3)
[1] 4.440892e-16 4.880310e-02 4.627277e-01
When individuals have no NAT (both infection and vaccine) we get the two functions differing, which was because we would do the pmax(.Machine$double.eps,...)
comparison twice in previous_nat
and then sum, which gives these individuals more protection.
In the new R function, we find the index of those vaccinated with VP vaccine, and then apply vfr to all other individuals. i.e.:
new_nat <- function(nat_vaccine, nat_infection, non_vp_indexes) {
nat_vaccine <- exp(nat_vaccine)
nat_infection <- exp(nat_infection)
# nat_infection should always be scaled by vfr regardless
nat_infection <- nat_infection / vfr
# apply vfr to only those that were vaccinated not during variant proof window
nat_vaccine[non_vp_indexes] <- nat_vaccine[non_vp_indexes]/vfr
# and combine these for overall NAT
nt <- pmax(.Machine$double.eps, nat_vaccine + nat_infection)
return(nt)
}
> new_nat(nat_vaccine, nat_infection, 1:3)
[1] 2.220446e-16 4.880310e-02 4.627277e-01
The second issue is we were only applying the VFR reduction for nat_vaccine
to individuals who had been vaccinated, whereas the original code was applying the VFR reduction to everyone.
@OJWatson nice! That's some pretty serious code detective work. The multinomial/multivariate hypergeometric confusion has tripped me up before, thanks for finding and fixing it.
I think we can let the CodeFactor tests slide. I also find the white space aids readability and also it complains about :::
but that's required to get rmhyper
out of odin. I'm not sure where those rules are anyway, I think all the mrc-ide repos have them applied? Not sure.
Everything runs here too...shall we merge?
That's great - I also just found out you can ignore at codefactor.io and don't need to always push linters etc in .lintr.
We are just running a couple full simulations now to check that this has corrected the issue in our larger more complex simulation, but once these look good we can start merging
Just starting a PR here now to see the changes easily and for CI