Feature Request: Improved VPC Simulation Datasets

billdenney commented 5 years ago

One of the features that I always want to overcome with VPCs is the discretized nature of the independent variable axis. It has historically been limited by the fact that simulations are performed on the basis of the original dataset rather than an augmented dataset. I have always interpreted the reason for this as simplicity: Running a VPC on the original dataset is easier than generating a new one, running the simulation on that, and then summarizing.

With that preamble...

Summary: At least for the single-endpoint model, I think that an expanded VPC dataset would improve the visualizations and precision of VPCs. For multi-endpoint models, that may become difficult or (near) impossible to include all endpoints.

I would love it if nlmixr would have an option to expand the dataset so that all individuals have all independent variable values. (For example, with actual time post dose, some subjects will have samples at 0.99, 1, and 1.01 hours-- so, ensure that all subjects have all three of those time points.)

There are some complexities that immediately come to mind:

How do you handle covariate values?
- I can imagine two simple options there as interpolate between the original points (with first observation carried backward [FOCB] and last observation carried forward [LOCF]) or a simple FOCB and LOCF without interpolation.
How do you handle the fact that some subjects may not be appropriate to have a given independent variable measurement?
- That is more complex, and I think that it would involve specifying the stratification and doing the independent variable expansion only within a stratification level.
What if there are multiple endpoints with either different independent variables, different strata, or different covariates of interest?
- For different independent variables (e.g. time for PK and concentration for PD), I think that either separate VPCs should be run by endpoint or it may just be approximately impossible. (It could be impossible in the case where concentration is the predictor, but with a simultaneous PK/PD model, you can't make concentration independent. In the impossible situation, again a fallback to the current standard may have to suffice.)
  - Alternate options which would be difficult to automate would be to have a multi-stage VPC where the PK model simulated first, then the PD model was simulated from the concentration data in the PK model with full expansion of the dataset. I would not advocate for this as part of the first solution.
- For different strata, I think that each endpoint should optionally be separated expanded by all strata. This one may be a problem to automatically fix, but I would guess that it would be a relatively uncommon problem for which the user could fall back to the current method.
- For different covariates of interest, I think that would be handled by just having all covariates extrapolated throughout the dataset (regardless of the endpoint).

mattfidler commented 5 years ago

Hi @billdenney

This should be doable; However, to me I don't know if it is a vpc any longer.

To me you simply need to:

Extract the data (getData) and do the data modifications suggested above and then send it to the simulate routine. You could also simply add the variability to make a prediction interval of sorts.
Note, you do not have to have an event table, you can simply use a nonmem-compatible data frame.
This code is what we covered briefly at ACoP and covers what is need to use the nlmixr simulation engine.
Since plotting the simulation gives a ggplot2 it would be simple to add the data on top of the new plot.
Last of course is the tests of the data expansion macros you suggested.

I will keep it on a feature request list. I am also open to pull requests.

mattfidler commented 5 years ago

With RxODE

FOCB First observation carried backward = NOCB next observation carried backward.

You would simply have to insert NA on these new observations and can apply this without any more modifications.

mattfidler commented 5 years ago

For multiple endpoints, the vpc already works, so simply extracting the vpc RxODE model should suffice. The new data is the key.

You can also use locf with RxODE

billdenney commented 5 years ago

This is something that I often struggle with because I tend to dislike automatic binning algorithms (I do understand that it's hard to automatically make bins good). And, to me, this is taking the concept of VPC from a discrete, binned group which could have bias at bin corners to a continuous unbinned group (since every subject has every independent variable observation, it is technically binned at all unique values). To me, it is like going from 8- to 64-bit in processing-- still the same thing in concept, but you can do a lot more.

My fingers are crossed that I'll be able to make a PR for it. (But I won't be foolish and promise a time scale for that.)

Thanks for the RxODE pointers which will help with implementation.

mattfidler commented 5 years ago

I also dislike binning.

I suppose that my solution didn't even include it (whoops)

mattfidler commented 5 years ago

If you want to go to a low level information, each nlmixr object has $simInfo which has the RxODE model, data, and options needed to continue.

mattfidler commented 5 years ago

One note -- augPred acutally may already do this with data; Perhaps it is closer to being complete. Now have to separate into another function.

nlmixrdevelopment / nlmixr

Feature Request: Improved VPC Simulation Datasets #247