Any interest of adding prediction discrepancy (PD) to the nlmixr output

diabloyg commented 2 years ago

Hi @mattfidler, Through the NPDE has been discussed extensively and has been added to standard output in several softwares including nlmixr, recently I found the intermediate output, prediction discrepancy (PD), would be simple and straightforward compared with NPDE, and could be considered as an alternative approach besides VPC and pcVPC. The paper discuss this topic could be found at: https://accp1.onlinelibrary.wiley.com/doi/abs/10.1177/0091270010390040.

Although some dispute has been raised regarding insufficient citation, personally I think PD is straightforward and suitable for graphical analysis. Since PD is not in the standard output, I manually did the calculation (with low efficiency) using RxODE as follows:

library(dplyr)
library(nlmixr) 
library(ggplot2)
library(RxODE)

##########################Data and Model result loading#########################

dat_POPPK<-read.csv("POPPK dataset.csv")
# Load the poppk dataset;

dat_subject<-summarize(group_by(dat_POPPK,ID),Cmax=max(DV))
dat_subject<-filter(dat_subject,Cmax>0)
# Select the subject with quantifiable observation;

dat_POPPK_INMODEL<-filter(dat_POPPK,(ID %in% dat_subject$ID)&(EVID==0))
dat_POPPK_INMODEL<-arrange(dat_POPPK_INMODEL,ID,TIME)
# Exclude the subject without quantifiable observation from poppk dataset;
# To be aligned with model fitting result;

load("fit_model_saem.RData")
dat_model<-data.frame(fit_model_saem)
# Load the model fitting result;

dat_model<-arrange(dat_model,ID,TIME)
dat_model$DOSE_LEVEL<-dat_POPPK_INMODEL$DOSE_LEVEL
dat_model$DOSE_N<-dat_POPPK_INMODEL$DOSE_N
# Add the dose level information to model fitting result for further analysis;

lst_subject<-as.numeric(levels(dat_model$ID))
num_subject<-length(lst_subject)
# Identify the subject list and number included in the model fitting;

dat_cor<-summarize(group_by(dat_model,ID),
                   BW=mean(BW),
                   ALB=mean(ALB),
                   IMP_OLD=mean(IMP_OLD))

dat_cor<-data.frame(dat_cor)
# Extract the covariant information for each subject;

############################Model reconstruction################################

mod <- RxODE({

  vc <- exp(theta_vc+eta_vc+theta_vc_BW*log(BW/60)+theta_vc_ALB*log(ALB/40)+theta_vc_IMP*IMP_OLD)
  cl <- exp(theta_cl+eta_cl+theta_cl_BW*log(BW/60)+theta_cl_ALB*log(ALB/40))
  cld <- exp(theta_cld)
  vp <- exp(theta_vp+eta_vp+theta_vc_IMP*IMP_OLD)
  vmax <-exp(theta_vmax+eta_vmax+theta_vmax_BW*log(BW/60))
  km <-exp(theta_km)

  cc=central/vc
  ke=cl/vc
  k12=cld/vc
  k21=cld/vp

  d/dt(central) = -ke*central-k12*central+k21*peripheral-vmax*cc/(km+cc)
  d/dt(peripheral) = k12*central-k21*peripheral
  d/dt(AUC) = cc

})

# Model used for simulation; the same as the model used for fitting;

######################Definition of Model omega matrix##########################

omega <- lotri({eta_cl+eta_vc+eta_vp ~ c(0.05,
                                         0.01,0.05,
                                         0.01,0.01,0.05;
                eta_vmax ~ 0.05})

#####calculation of quantile for each observation in population simulation######

dat_final<-data.frame()

n_sum<-1000
# Designate the amount of simulation to be performed for each subject;

dat_POPPK_INMODEL_OBS<-filter(dat_POPPK_INMODEL,MDV==0)
#Exclude those record without observation;

dat_EVENT_all<-subset(dat_POPPK,select = c(ID,TIME,AMT,RATE,EVID))
# Derive the event table for simulation;

i<-1

for (i in 1:length(lst_subject))

  {dat_EVENT<-filter(dat_EVENT_all,ID==lst_subject[i])
   # Derive the individual event table dataset for simulation;

   dat_POPPK_INMODEL_indiv<-filter(dat_POPPK_INMODEL_OBS,ID==lst_subject[i])
   # Obtain the individual dosing and sampling record;

   num_row<-length(dat_POPPK_INMODEL_indiv)

   dat_EVENT$Subject<-dat_EVENT$ID
   dat_EVENT$ID<-1
   dat_EVENT_ADD<-dat_EVENT

   for (j in 2:n_sum){
     dat_EVENT_ADD$ID<-j
     dat_EVENT<-rbind(dat_EVENT,dat_EVENT_ADD)}
   # Create the event table dataset for repeated simulation;

   dat_et<-et(dat_EVENT)
   # Convert to event table format;

   dat_theta<- c(theta_vc=log(3000),theta_cl=log(10),theta_vp=log(1000),theta_cld=log(20),theta_vmax=log(900000),theta_km=log(3000),
                        theta_vc_IMP=0.169,theta_vc_BW=0.843,theta_vc_ALB=-0.419,theta_cl_BW=0.615,theta_cl_ALB=-1.09,
                        theta_vmax_BW=1.14,BW=dat_cor[i,"BW"],ALB=dat_cor[i,"ALB"],IMP_OLD=dat_cor[i,"IMP_OLD"])

   # Create model parameter and add covariant information for subject(i);

   sim_individual<-rxSolve(mod,dat_theta,dat_et,omega=omega,nsub=n_sum)
   # Simulation;

   sim_individual$Subject<-dat_EVENT[1,"Subject"]
   # Designate the subject ID in original dataset;

   k<-1

   for (k in 1:nrow(dat_POPPK_INMODEL_indiv)){
     sim_individual_TIME<-filter(sim_individual,time==dat_POPPK_INMODEL_indiv[k,"TIME"])
     # Select the simulation results according to relative time;

     sim_individual_TIME_LOWER<-filter(sim_individual_TIME,cc<=dat_POPPK_INMODEL_indiv[k,"DV"])
     # Select those simulated record with DV value lower than original observation; 

     dat_POPPK_INMODEL_indiv[k,"Quantile"]<-nrow(sim_individual_TIME_LOWER)/n_sum}
     # Calculate the quantile of each observation;

     dat_final<-rbind(dat_final,dat_POPPK_INMODEL_indiv)}
     # Combine the current subject data to the overall dataset;

write.csv(dat_final,"SVPC result.csv")

^{Created on 2022-06-01 by the reprex package (v2.0.1)}

I think that would be helpful especially for dose escalation stage analysis.

mattfidler commented 2 years ago

I think it is already there as pde though I would have to get to a computer ro check.

I would recommend them over npde myself. To add to the ambiguity Monolix reports npde when they actually are pde values...

On Tue, May 31, 2022, 6:34 PM Guang Yang @.***> wrote:

Hi @mattfidler https://github.com/mattfidler, Through the NPDE has been discussed extensively and has been added to standard output in several softwares including nlmixr, recently I found the intermediate output, prediction discrepancy (PD), would be simple and straightforward compared with NPDE, and could be considered as an alternative approach besides VPC and pcVPC. The paper discuss this topic could be found at: https://accp1.onlinelibrary.wiley.com/doi/abs/10.1177/0091270010390040.

Although some dispute has been raised regarding insufficient citation, personally I think PD is straightforward and suitable for graphical analysis. Since PD is not in the standard output, I manually did the calculation (with low efficiency) using RxODE as follows:

library(dplyr) library(nlmixr) library(ggplot2) library(RxODE) ##########################Data and Model result loading######################### dat_POPPK<-read.csv("POPPK dataset.csv")# Load the poppk dataset; dat_subject<-summarize(group_by(dat_POPPK,ID),Cmax=max(DV))dat_subject<-filter(dat_subject,Cmax>0)# Select the subject with quantifiable observation; dat_POPPK_INMODEL<-filter(dat_POPPK,(ID %in% dat_subject$ID)&(EVID==0))dat_POPPK_INMODEL<-arrange(dat_POPPK_INMODEL,ID,TIME)# Exclude the subject without quantifiable observation from poppk dataset;# To be aligned with model fitting result;

load("fit_model_saem.RData")dat_model<-data.frame(fit_model_saem)# Load the model fitting result; dat_model<-arrange(dat_model,ID,TIME)dat_model$DOSE_LEVEL<-dat_POPPK_INMODEL$DOSE_LEVELdat_model$DOSE_N<-dat_POPPK_INMODEL$DOSE_N# Add the dose level information to model fitting result for further analysis; lst_subject<-as.numeric(levels(dat_model$ID))num_subject<-length(lst_subject)# Identify the subject list and number included in the model fitting; dat_cor<-summarize(group_by(dat_model,ID), BW=mean(BW), ALB=mean(ALB), IMP_OLD=mean(IMP_OLD)) dat_cor<-data.frame(dat_cor)# Extract the covariant information for each subject; ############################Model reconstruction################################ mod <- RxODE({

vc <- exp(theta_vc+eta_vc+theta_vc_BWlog(BW/60)+theta_vc_ALBlog(ALB/40)+theta_vc_IMPIMP_OLD) cl <- exp(theta_cl+eta_cl+theta_cl_BWlog(BW/60)+theta_cl_ALBlog(ALB/40)) cld <- exp(theta_cld) vp <- exp(theta_vp+eta_vp+theta_vc_IMPIMP_OLD) vmax <-exp(theta_vmax+eta_vmax+theta_vmax_BW*log(BW/60)) km <-exp(theta_km)

cc=central/vc ke=cl/vc k12=cld/vc k21=cld/vp

d/dt(central) = -kecentral-k12central+k21peripheral-vmaxcc/(km+cc) d/dt(peripheral) = k12central-k21peripheral d/dt(AUC) = cc

})

Model used for simulation; the same as the model used for fitting;

######################Definition of Model omega matrix########################## omega <- lotri({eta_cl+eta_vc+eta_vp ~ c(0.05, 0.01,0.05, 0.01,0.01,0.05; eta_vmax ~ 0.05})

calculation of quantile for each observation in population simulation

dat_final<-data.frame() n_sum<-1000# Designate the amount of simulation to be performed for each subject; dat_POPPK_INMODEL_OBS<-filter(dat_POPPK_INMODEL,MDV==0)#Exclude those record without observation; dat_EVENT_all<-subset(dat_POPPK,select = c(ID,TIME,AMT,RATE,EVID))# Derive the event table for simulation; i<-1 for (i in 1:length(lst_subject))

{dat_EVENT<-filter(dat_EVENT_all,ID==lst_subject[i])

Derive the individual event table dataset for simulation;

dat_POPPK_INMODEL_indiv<-filter(dat_POPPK_INMODEL_OBS,ID==lst_subject[i])

Obtain the individual dosing and sampling record;

num_row<-length(dat_POPPK_INMODEL_indiv)

dat_EVENT$Subject<-dat_EVENT$ID dat_EVENT$ID<-1 dat_EVENT_ADD<-dat_EVENT

for (j in 2:n_sum){ dat_EVENT_ADD$ID<-j dat_EVENT<-rbind(dat_EVENT,dat_EVENT_ADD)}

Create the event table dataset for repeated simulation;

dat_et<-et(dat_EVENT)

Convert to event table format;

dat_theta<- c(theta_vc=log(3000),theta_cl=log(10),theta_vp=log(1000),theta_cld=log(20),theta_vmax=log(900000),theta_km=log(3000), theta_vc_IMP=0.169,theta_vc_BW=0.843,theta_vc_ALB=-0.419,theta_cl_BW=0.615,theta_cl_ALB=-1.09, theta_vmax_BW=1.14,BW=dat_cor[i,"BW"],ALB=dat_cor[i,"ALB"],IMP_OLD=dat_cor[i,"IMP_OLD"])

Create model parameter and add covariant information for subject(i);

sim_individual<-rxSolve(mod,dat_theta,dat_et,omega=omega,nsub=n_sum)

Simulation;

sim_individual$Subject<-dat_EVENT[1,"Subject"]

Designate the subject ID in original dataset;

k<-1

for (k in 1:nrow(dat_POPPK_INMODEL_indiv)){ sim_individual_TIME<-filter(sim_individual,time==dat_POPPK_INMODEL_indiv[k,"TIME"])

Select the simulation results according to relative time;
 sim_individual_TIME_LOWER<-filter(sim_individual_TIME,cc<=dat_POPPK_INMODEL_indiv[k,"DV"])
 # Select those simulated record with DV value lower than original observation;

 dat_POPPK_INMODEL_indiv[k,"Quantile"]<-nrow(sim_individual_TIME_LOWER)/n_sum}
 # Calculate the quantile of each observation;

 dat_final<-rbind(dat_final,dat_POPPK_INMODEL_indiv)}
 # Combine the current subject data to the overall dataset;
write.csv(dat_final,"SVPC result.csv")

Created on 2022-06-01 by the reprex package https://reprex.tidyverse.org (v2.0.1)

I think that would be helpful especially for dose escalation stage analysis.

— Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/nlmixr/issues/611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5VWQWT7PU5J2ZVR6H2R3VM3R5ZANCNFSM5XPV2V7A . You are receiving this because you were mentioned.Message ID: @.***>

diabloyg commented 2 years ago

I think there has been nomination issues regarding this. From my understanding, PD is the one I mentioned, and PDE is the decorrelated version of PD, which should be in uniform distribution. For NPD and NPDE, these are the transformed variables using PD and PDE with the inversed CDF function from a standard normal distribution, which should then be in normal distribution. In these 4 variables, PD would be the most straightforward one.

mattfidler commented 2 years ago

I could be wrong here since I havent lookd at thos for awhil. I reall that we never used uniform for pde. We implemented it baesed on how it was implemented in the npde cran package. We based our terms on what was used in the npde package. I believe NONMEMs terms match here. You can check youself since we are open source

On Tue, May 31, 2022, 8:23 PM Guang Yang @.***> wrote:

I think there has been nomination issues regarding this. From my understanding, PD is the one I mentioned, and PDE is the decorrelated version of PD, which should be in uniform distribution. For NPD and NPDE, these are the transformed variables using PD and PDE with the inversed CDF function from a standard normal distribution, which should then be in normal distribution. In these 4 variables, PD would be the most straightforward one.

— Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/nlmixr/issues/611#issuecomment-1143166194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5VWWORZPNVH7EOJNBOV3VM36XLANCNFSM5XPV2V7A . You are receiving this because you were mentioned.Message ID: @.***>

diabloyg commented 2 years ago

I could be wrong here since I havent lookd at thos for awhil. I reall that we never used uniform for pde. We implemented it baesed on how it was implemented in the npde cran package. We based our terms on what was used in the npde package. I believe NONMEMs terms match here. You can check youself since we are open source … On Tue, May 31, 2022, 8:23 PM Guang Yang @.> wrote: I think there has been nomination issues regarding this. From my understanding, PD is the one I mentioned, and PDE is the decorrelated version of PD, which should be in uniform distribution. For NPD and NPDE, these are the transformed variables using PD and PDE with the inversed CDF function from a standard normal distribution, which should then be in normal distribution. In these 4 variables, PD would be the most straightforward one. — Reply to this email directly, view it on GitHub <#611 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5VWWORZPNVH7EOJNBOV3VM36XLANCNFSM5XPV2V7A . You are receiving this because you were mentioned.Message ID: @.>

I followed your recommendation and found that in the npde package, the pde and npde is calculated by: npd<-qnorm(pd), and npde<-qnorm(pde), both are transformed and should follow normal distribution. For pd and pde, those are the percentage points derived from simulation, which should follow uniform distribution. For straight graphical analysis, I recommended pd instead of npd/pde/npde.

mattfidler commented 2 years ago

I will think on it. I will not add it to nlmixr but may add to nlmixr2.

I do have a couple of concerns here but we can discuss it later when I get back to the office

diabloyg commented 2 years ago

whoops......Sorry for interrupt your holidays :(

mattfidler commented 2 years ago

No problem. You have no idea, of course.

mattfidler commented 1 year ago

In nlmixr2est now

nlmixrdevelopment / nlmixr

Any interest of adding prediction discrepancy (PD) to the nlmixr output #611

Model used for simulation; the same as the model used for fitting;

calculation of quantile for each observation in population simulation

Derive the individual event table dataset for simulation;

Obtain the individual dosing and sampling record;

Create the event table dataset for repeated simulation;

Convert to event table format;

Create model parameter and add covariant information for subject(i);

Simulation;

Designate the subject ID in original dataset;

Select the simulation results according to relative time;