kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
176 stars 80 forks source link

Issue with integer and numeric data class or type for obj$simulate(complete=TRUE) #391

Closed timjmiller closed 6 months ago

timjmiller commented 6 months ago

Description:

When data inputs are integer type (DATA_INTEGER, DATA_IVECTOR, DATA_IARRAY) on C++ side and integer class is specified for these in R the model object can be created by MakeADFun without issue. However, when the compiled model is used to simulate and return all data inputs (simulated or not), the data types that are supposed to be integer types have class numeric on R side. Changing the appropriate returned class to integer and creating a new model object with MakeADFun results in an error.

Current Output:

Error in getParameterOrder(data, parameters, new.env(), DLL = DLL) : NOT A VECTOR!

It appears to be thrown by the asVector function which is called by objective_function in getParameterOrder.

The error goes away when the "check.passed" attribute of the list returned by obj$simulate(complete=TRUE) is removed.

Expected Output:

It seems like the class of the data types returned by obj$simulate(complete=TRUE) should be consistent with the type called for in the TMB model and that check.passed = TRUE should not throw an error when they are consistent.

Reproducible Steps:

Below is a simple modification to the linreg example to illustrate the issue.

C++ ` // Simple linear regression.

include

template Type objective_function::operator() () { DATA_INTEGER(n); DATA_VECTOR(Y); DATA_VECTOR(x); PARAMETER(a); PARAMETER(b); PARAMETER(logSigma); ADREPORT(exp(2logSigma)); Type nll = 0; for(int i = 0; i < n; i++) nll += -dnorm(Y(i), a+bx(i), exp(logSigma), true); SIMULATE { for(int i = 0; i < n; i++) Y(i) = rnorm(a+b*x(i), exp(logSigma)); REPORT(Y); } return nll; } `

R ` library(TMB) compile("linreg.cpp") dyn.load(dynlib("linreg")) set.seed(123) data <- list(n = as.integer(10), Y = rnorm(10) + 1:10, x=1:10) class(data$n) #integer

parameters <- list(a=0, b=0, logSigma=0) obj <- MakeADFun(data, parameters, DLL="linreg") #works obj$hessian <- TRUE opt <- do.call("optim", obj) opt opt$hessian ## <-- FD hessian from optim obj$he() ## <-- Analytical hessian sdreport(obj)

temp <- obj$simulate(complete=T) class(temp$n) #numeric? obj2 <- MakeADFun(temp, parameters, DLL="linreg") #works

temp$n <- as.integer(temp$n) class(temp$n) #integer, as it should be

Below gives:

Error in getParameterOrder(data, parameters, new.env(), DLL = DLL) :

NOT A VECTOR!

obj3 <- MakeADFun(temp, parameters, DLL="linreg")

attributes(temp) #check.passed = TRUE attr(temp, "check.passed") <- NULL obj4 <- MakeADFun(temp, parameters, DLL="linreg") #works `

TMB Version:

1.9.10

R Version:

4.2.2

Operating System:

This issue occurs in both Windows and Linux for me.

kaskr commented 6 months ago

The behaviour may seem weird, but it is part of the design. Here's the explanation:

To keep things simple, it was decided early on, to convert all user data to the same storage mode (double) before passing it to C++. That way the user can safely pass e.g. x=1:10 (int) or x=1:10+.5 (double) via DATA_VECTOR(x) without thinking about storage modes. The check.passed attribute is set to signify that the conversion has been carried out and it's safe to pass the structure to C++.

When simulating using simulate(complete=TRUE), you'll get an object with the check.passed attribute set. This is a feature, that avoids unnecessary conversions when the simulated data is passed back to MakeADFun. If you change the storage modes of the simulated data, the check.passed is misleading which explains the obj3 error. If you remove the attribute you can freely change the storage modes, because MakeADFun knows that it has to apply a conversion. That explains why obj4 works.

timjmiller commented 6 months ago

Thank you for this explanation, Kasper.

Two suggestions then for your consideration: 1) Perhaps the check.passed T/F attribute could be documented somewhere? I couldn't find anything. 2) A better error message than "NOT A VECTOR" would be helpful. As is, it implies it is a parameter that is an issue. Maybe report that it is a data or parameter object and, even better, the name of the offending object.

In the situation where this arose for me, there are numerous data and parameter elements and it took a while to figure out which was the problem and it was unclear what the problem was.

As an example where this can come up: simulating data from one model and using it to fit a different model where some elements have to be modified.

kaskr commented 6 months ago

Think it should be fixed now.