kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
178 stars 80 forks source link

"posfun" command for cpp declaration of model #7

Closed ghost closed 10 years ago

ghost commented 10 years ago

Hi all,

Just a quick note that many state-space implementations require a function similar to "posfun" in ADMB, where a scalar of class Type is returned unchanged if above a certain threshold, but is returned at the threshold if below the threshold, and where the objective function is modified to penalize instances where it is below the threshold.

For example, an implementation of a state-space theta-logistic model requires posfun if using a lognormal process error (rather than the normal process error here: https://groups.nceas.ucsb.edu/non-linear-modeling/projects/theta/ADMB/theta.tpl)

Cheers, Jim

kaskr commented 10 years ago

Something along the lines of this:

template<class Type>
Type posfun(Type x, Type eps, Type &pen){
  pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
  return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}

should do what you request - not yet tested.

smartell commented 10 years ago

I suspect that eps is likely to be a different type (.e.g, double), so you might want to try:

template<class Type, class Type2> Type posfun(Type x, Type2 eps, Type &pen){ etc.}

Steve On Mar 7, 2014, at 2:29 PM, kaskr notifications@github.com wrote:

Something along the lines of this:

template Type posfun(Type x, Type eps, Type &pen){ pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0)); return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps)); } should do what you request - not yet tested.

— Reply to this email directly or view it on GitHub.

kaskr commented 10 years ago

Best practice IMO would be to pass eps as

DATA_SCALAR(eps);

In which case eps is a Type.

mebrooks commented 9 years ago

I wrote an example where I approximate binary data with a normal distribution without bounding parameter p. It doesn't work well. Using posfun makes the error go away, but the parameter estimate is wrong. p should be 0.1 but it goes to -6.822958e-05

#include <TMB.hpp>
template<class Type>
Type posfun(Type x, Type eps, Type &pen){
  pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
  return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}

template<class Type>
Type objective_function<Type>::operator() ()
{
    DATA_SCALAR(eps);
    DATA_VECTOR(x);

    PARAMETER(p);
    Type pen;
    pen=0;
//  Type eps;
//  eps=1e-6;

    Type nll;
    nll=0;

    Type var;
    var=p*(1-p);
    var=posfun(var, eps, pen);
    nll+=pen;

    nll += -sum(dnorm(x, p, sqrt(var)));
    return nll;
}

with R code

library(TMB)
compile("posfun2.cpp") 
dyn.load("posfun2.so")  
n=1000
set.seed(2)
p=.1
x=rbinom(n, size=1, prob=p)
mean(x)
var(x)
p
p*(1-p)

mod=MakeADFun(list(x=x, eps=1e-6), list(p=0), DLL="posfun2")
o <- nlminb(mod $par, mod $fn, mod $gr)
mebrooks commented 9 years ago

I forgot to say log=true

nll += -sum(dnorm(x, p, sqrt(var), true));

Now I get better estimate of p= 0.1249981, but even with n=100000, it doesn't get better.

mebrooks commented 9 years ago

Increasing eps from 1e-6 to 1e-3 improved the fit.

mod=MakeADFun(list(x=x, eps=1e-3), list(p=0), DLL="posfun2")

Now I get p=0.10125, the true mean. So I'm satisfied with this test.

It also gives the same result if I define eps on the fly

//  DATA_SCALAR(eps);
    DATA_VECTOR(x);

    PARAMETER(p);
    Type pen;
    pen=0;
    Type eps;
    eps=1e-3;
James-Thorson commented 9 years ago

Thanks Mollie!

I'm still having some weird issues regarding posfun() as suggested above when I have a mixed-effects model.

Specifically modifying Mollie's example, with R code:

library(TMB)
compile("test.cpp")
dyn.load( dynlib("test") )
n=1000
set.seed(2)
p=.1
x=rbinom(n, size=1, prob=p)

and TMB code

#include <TMB.hpp>
template<class Type>
Type posfun(Type x, Type eps, Type &pen){
  pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
  return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}

template<class Type>
Type objective_function<Type>::operator() ()
{
    DATA_SCALAR(eps);
    DATA_VECTOR(x);

    PARAMETER(p);
    PARAMETER(Dummy);
    Type pen;
    pen=0;

    Type nll;
    nll=0;

    Type var;
    var = p*(1-p);
    var = posfun(var, eps, pen);
    nll += pen;

    nll += -sum( dnorm(x, p, sqrt(var),true) );
    nll += -dnorm(Dummy, Type(0.0), Type(1.0), true);
    return nll;
}

It works well in a model with only fixed effects:

mod = MakeADFun(data=list(x=x, eps=1e-3), parameters=list(p=0,Dummy=0), random=NULL)
o <- nlminb(mod$par, mod$fn, mod$gr)

but doesn't work when "p" (the parameter governing the variable that enters posfun) is random:

mod = MakeADFun(data=list(x=x, eps=1e-3), parameters=list(p=0,Dummy=0), random="p")
o <- nlminb(mod$par, mod$fn, mod$gr)

Any thoughts on how to define posfun where it can work for variables derived from random effects? I also have a non-trivial issue with this in a different model (which is less suitable as an example).

bbolker commented 9 years ago

Can we please re-open this? It still seems to be a bit of a problem with TMB version 1.1.1 ...

@James-Thorson : can you comment on what "doesn't work" means in your context? Was it a segfault, or was the answer wrong (we haven't spent enough time staring at your code to know what answer it should be giving ...)?

James Thorson's code got a bit mangled so I'm re-posting it here with better formatting ...

#include <TMB.hpp>

template<class Type>
Type posfun(Type x, Type eps, Type &pen){
    pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
    return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}

template<class Type>
Type objective_function<Type>::operator() ()
{
DATA_SCALAR(eps);
 DATA_VECTOR(x);

 PARAMETER(p);
 PARAMETER(Dummy);
 Type pen;
 pen=0;

 Type nll;
 nll=0;

 Type var;
 var = p*(1-p);
 var = posfun(var, eps, pen);
 nll += pen;

 nll += -sum( dnorm(x, p, sqrt(var),true) );
 nll += -dnorm(Dummy, Type(0.0), Type(1.0), true);
 return nll;

}
kaskr commented 9 years ago

@bbolker I have trouble reproducing the segfault on 32 bit ubuntu with R-3.0.1. Could you confirm that you get a crash by 1. checking out the branch 'posfun'. 2. install TMB from that branch. 3. Run posfun example from tmb_examples ?

bbolker commented 9 years ago

hmm, can't reproduce now ... under

R Under development (unstable) (2015-07-29 r68752)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu precise (12.04.5 LTS)

[snip]

other attached packages:
[1] TMB_1.1-1

loaded via a namespace (and not attached):
[1] compiler_3.3.0  Matrix_1.2-2    tools_3.3.0     grid_3.3.0     
[5] lattice_0.20-33
samueldnj commented 6 years ago

Was this ever resolved? I'm having trouble with this myself in state-space Schafer model context similar to Jim Thorson's problem (my working code is not suitable as an example, if this spurs a discussion I'll create a minimal example).

I don't get a segfault, just NA/NaN gradient evaluations that stop the optimisation during the integration of my process error terms, which posfun() is supposed to avoid.

Also, I'm not sure we want to use the denominator in the expression eps/(Type(2)-x/eps) that is returned for the case where x < eps, as this will return a value lower than the threshold eps.

bbolker commented 4 years ago

Coming back here to post a link to the ADMB posfun documentation, and to explain a little bit more about what posfun() does/is supposed to do (since I had to figure it out for myself). The point of eps is not to guarantee that all values are greater than eps; it's to guarantee that values above eps are unchanged, and values between eps and -infinity are smoothly adjusted to be between 0 and eps.

May I suggest that this function be added as built-in utility in TMB? (Fancier versions could include (i) a bound other than zero (ii) a 'sign' (greater than/less than) switch (iii) an adjustable penalty (instead of a hard-coded value of 0.01) ...) I could submit a PR if requested ...

 curve(ifelse(x<eps,eps/(2-x/eps),x),-2e-3,2e-3)
abline(v=c(0,eps),lty=2)
abline(a=0,b=1,lty=3)

eps

kaskr commented 4 years ago

@bbolker I'm sceptical about posfun. It seems to cause trouble when used to transform random effects (based on user reports) - and I'm not surprised: The second order derivative of posfun is not continuous. If used with the Laplace approximation one would expect that each call to posfun adds new discontinuities to the marginal likelihood. So, no - I don't think posfun should be in the main TMB distribution.

bbolker commented 4 years ago

Fair enough. I suppose there might be a way to do this with continuous second derivatives ...

kaskr commented 4 years ago

Yes, there should be many ways to construct something similar to posfun with better smoothness properties. A quick shot:

f <- function(x)eps*log(exp(x/eps)+1)
bbolker commented 4 years ago

I'm being super-lazy, but it seems there might be a typo in your new version?

source_gist("gist.github.com/bbolker/fea73a45b2f8c8e183663af566337b53")

This shows the function, first and second derivative ...

posfun

kaskr commented 4 years ago

@bbolker I should have been more precise with 'something similar to posfun'. I'm assuming the purpose of posfun is to approximate the positive part max(0,x) by a smooth monotone function with an error controlled by eps. For this purpose my f above works just as well as posfun. However, it does not hold exact that f(x)=x for all x>eps (in contrast to posfun).

fishfollower commented 4 years ago

I think the new suggestion looks interesting and when implementing it, it could be made more robust for large values by using logspace_add (the function already in TMB).

Example in R:

logspace_add <- function(a,b) pmax(a, b) + log1p(exp(pmin(a,b)-pmax(a, b))) f <- function(x, eps=0.001)eps*logspace_add(x/eps,0) library(numDeriv) Df <- function(x)grad(f,x) DDf <- function(x)grad(Df,x) par(mfrow=c(1,3)) plot(f,-0.01,0.01, main=0) plot(Df,-.01,0.01, main=1) plot(DDf,-.01,0.01, main=2)

johnrsibert commented 4 years ago

There are indeed many ways. If your are willing to tolerate a third order polynomial between eps and zero, you can determine the values of the necessary coefficients by assuming the second derivative is zero at eps.

On 6/10/20 10:15 AM, kaskr wrote:

Yes, there should be many ways to construct something similar to posfun with better smoothness properties. A quick shot:

|f <- function(x)eps*log(exp(x/eps)+1) |

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaskr/adcomp/issues/7#issuecomment-642145449, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJQG4WHUWHOEE5CVLGYTKLRV65TLANCNFSM4AM7H5MQ.

bbolker commented 4 years ago

Just back to mention that a TMB version of @fishfollower's suggestion above

template<class Type>
Type posfun(Type x, Type eps, Type &pen) {
  pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
  return CppAD::CondExpGe(x,eps,x,eps*logspace_add(x/eps,Type(0)));
}

works fine with @James-Thorson's example above, even with random effects (presumably because the second derivative is now smooth).

fishfollower commented 4 years ago

I understood @kaskr's suggestion as a way to avoid the conditional expressions, so we could simply use:

template<class Type>
Type posfunx(Type x, Type eps){
  return eps*logspace_add(x/eps,Type(0));
}

Does that not work?

bbolker commented 4 years ago

OK, I misunderstood. I'm looking for a function with the following properties:

  1. f(x) > 0 for all x
  2. f'(x) > for all x
  3. f(x) = x for x > eps
  4. f'(x) and f''(x) are continuous everywhere (including at x=eps)

I now realize that @fishfollower was just refining @kaskr's suggestion, which instead of criterion 3 substitutes " f(x) ≈ x for x > eps (or, f(x) → eps as x → infinity)".

A more general question: where is everyone's intuition/knowledge coming from on these topics? I imagine that if I thought about it hard enough I could generalize the original 'posfun' expression (presumably originally from Dave Fournier?) by writing down a quadratic analogue, or I could figure out a cubic polynomial on the log scale that interpolated between f(eps)=x, f'(eps)=1, f''(eps)=0 and f(0) = ?, f(0) = -1, f''(0)=0 ? (The rational-function solution might be preferable since it would have a slower rate of decrease ... ?)

bbolker commented 4 years ago

OK, I worked it out. If we use eps / sum( (-1)^i*(x-eps)^i/eps^i) for x<eps and x for x>eps, if the summation runs from i=0 to i=n, then we have continuous derivatives up to order n. Is there any point to going higher than n=2 ... ?

kaskr commented 4 years ago

Strictly speaking, the gradient of the Laplace approximation requires n=3 and the Hessian requires n=4.

bbolker commented 4 years ago

This function has continuous fifth derivatives. Could drop the last term if we only want n=4. I implemented it this way because I wanted to be able to use R's D() function (which wouldn't work on a summation), but there are obviously plenty of more efficient ways to implement this for use (if desired) within TMB ...

f <- function(x,eps=0.001) {
    eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2 - (x-eps)^3/eps^3
        + (x-eps)^4/eps^4 - (x-eps)^5/eps^5))
}

see Rmd and pdf for more details ...

bbolker commented 4 years ago

I might go ahead and make an example implementation of this. Should I do the whole TMB_ATOMIC_VECTOR_FUNCTION thing, or just code it as a simple function for now?

kaskr commented 4 years ago

@bbolker OK, that looks like an improvement of the original posfun.

Reading your pdf document I was reminded of the fact that the original posfun adds a penalty to nll. That makes sense when box-constraining model parameters. However, for random effects it's not that simple... We must make sure that our specified model density (corresponding to nll) without data integrates to one, right? However, it seems to me that adding penalties messes up the normalizing constant.

Before considering implementation details of posfun, perhaps it's worth demonstrating on a simple simulation-reestimation example that you can make box constraints work on random effects using posfun? (I doubt it will work... Am I missing something?)

bbolker commented 4 years ago

Getting back to this ... what kind of example do you suggest? If the constraints aren't binding at the optimal solution (i.e. if the main purpose of posfun is to get us through/around bad patches of parameter space), then it should be OK ... @James-Thorson's example above "works", but I think the main purpose of that example was to show that the old posfun was grossly broken for random effects (because of the derivative characteristics), not that it would give a good/correct answers when the constraints were binding ...

kaskr commented 4 years ago

@bbolker That is exactly my problem: I cannot imagine any random effect examples for which posfun is useful!

If the constraints aren't binding at the optimal solution (i.e. if the main purpose of posfun is to get us through/around bad patches of parameter space), then it should be OK ...

Example: Consider integrating dnorm(x) * (x > -1) using the Laplace approximation corresponding to a 'bad patch' x < -1 coded up using posfun. The constraint is non-binding so the Laplace approximation is the same as that of dnorm(x) - i.e. obviously wrong. I could have picked any other 'bad patch' on negative reals and the result would be the same. How do you decide whether the patch is insignificant in general? And what did I gain in my example above by using posfun to handle the constraint?

MarkMaunder commented 4 years ago

Did any solve this issue for the state-space theta-logistic model using posfun with lognormal process error? I am using it and having convergence issues. The reason posfun is needed is because it is catch conditioned (catch is removed as known). Is using posfun with random effects just a bad idea and you have to estimate F as a random effect and fit to catch.

kaskr commented 4 years ago

@MarkMaunder

Is using posfun with random effects just a bad idea

Yes!

The reason posfun is needed is because it is catch conditioned (catch is removed as known)

If I understand the problem correctly, there's a constraint on the state variable log(B) in that current catch must be smaller than current biomass C < B. Couldn't that be handled by transforming the state variable? U := log(B-C)

(BTW: This discussion is more appropriate for tmb-users list...)