Closed ghost closed 10 years ago
Something along the lines of this:
template<class Type>
Type posfun(Type x, Type eps, Type &pen){
pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}
should do what you request - not yet tested.
I suspect that eps is likely to be a different type (.e.g, double), so you might want to try:
template<class Type, class Type2> Type posfun(Type x, Type2 eps, Type &pen){ etc.}
Steve On Mar 7, 2014, at 2:29 PM, kaskr notifications@github.com wrote:
Something along the lines of this:
template
Type posfun(Type x, Type eps, Type &pen){ pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0)); return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps)); } should do what you request - not yet tested. — Reply to this email directly or view it on GitHub.
Best practice IMO would be to pass eps as
DATA_SCALAR(eps);
In which case eps is a Type.
I wrote an example where I approximate binary data with a normal distribution without bounding parameter p. It doesn't work well. Using posfun
makes the error go away, but the parameter estimate is wrong. p should be 0.1 but it goes to -6.822958e-05
#include <TMB.hpp>
template<class Type>
Type posfun(Type x, Type eps, Type &pen){
pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}
template<class Type>
Type objective_function<Type>::operator() ()
{
DATA_SCALAR(eps);
DATA_VECTOR(x);
PARAMETER(p);
Type pen;
pen=0;
// Type eps;
// eps=1e-6;
Type nll;
nll=0;
Type var;
var=p*(1-p);
var=posfun(var, eps, pen);
nll+=pen;
nll += -sum(dnorm(x, p, sqrt(var)));
return nll;
}
with R code
library(TMB)
compile("posfun2.cpp")
dyn.load("posfun2.so")
n=1000
set.seed(2)
p=.1
x=rbinom(n, size=1, prob=p)
mean(x)
var(x)
p
p*(1-p)
mod=MakeADFun(list(x=x, eps=1e-6), list(p=0), DLL="posfun2")
o <- nlminb(mod $par, mod $fn, mod $gr)
I forgot to say log=true
nll += -sum(dnorm(x, p, sqrt(var), true));
Now I get better estimate of p= 0.1249981, but even with n=100000, it doesn't get better.
Increasing eps
from 1e-6 to 1e-3 improved the fit.
mod=MakeADFun(list(x=x, eps=1e-3), list(p=0), DLL="posfun2")
Now I get p=0.10125
, the true mean. So I'm satisfied with this test.
It also gives the same result if I define eps
on the fly
// DATA_SCALAR(eps);
DATA_VECTOR(x);
PARAMETER(p);
Type pen;
pen=0;
Type eps;
eps=1e-3;
Thanks Mollie!
I'm still having some weird issues regarding posfun() as suggested above when I have a mixed-effects model.
Specifically modifying Mollie's example, with R code:
library(TMB)
compile("test.cpp")
dyn.load( dynlib("test") )
n=1000
set.seed(2)
p=.1
x=rbinom(n, size=1, prob=p)
and TMB code
#include <TMB.hpp>
template<class Type>
Type posfun(Type x, Type eps, Type &pen){
pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}
template<class Type>
Type objective_function<Type>::operator() ()
{
DATA_SCALAR(eps);
DATA_VECTOR(x);
PARAMETER(p);
PARAMETER(Dummy);
Type pen;
pen=0;
Type nll;
nll=0;
Type var;
var = p*(1-p);
var = posfun(var, eps, pen);
nll += pen;
nll += -sum( dnorm(x, p, sqrt(var),true) );
nll += -dnorm(Dummy, Type(0.0), Type(1.0), true);
return nll;
}
It works well in a model with only fixed effects:
mod = MakeADFun(data=list(x=x, eps=1e-3), parameters=list(p=0,Dummy=0), random=NULL)
o <- nlminb(mod$par, mod$fn, mod$gr)
but doesn't work when "p" (the parameter governing the variable that enters posfun) is random:
mod = MakeADFun(data=list(x=x, eps=1e-3), parameters=list(p=0,Dummy=0), random="p")
o <- nlminb(mod$par, mod$fn, mod$gr)
Any thoughts on how to define posfun where it can work for variables derived from random effects? I also have a non-trivial issue with this in a different model (which is less suitable as an example).
Can we please re-open this? It still seems to be a bit of a problem with TMB version 1.1.1 ...
@James-Thorson : can you comment on what "doesn't work" means in your context? Was it a segfault, or was the answer wrong (we haven't spent enough time staring at your code to know what answer it should be giving ...)?
James Thorson's code got a bit mangled so I'm re-posting it here with better formatting ...
#include <TMB.hpp>
template<class Type>
Type posfun(Type x, Type eps, Type &pen){
pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
return CppAD::CondExpGe(x,eps,x,eps/(Type(2)-x/eps));
}
template<class Type>
Type objective_function<Type>::operator() ()
{
DATA_SCALAR(eps);
DATA_VECTOR(x);
PARAMETER(p);
PARAMETER(Dummy);
Type pen;
pen=0;
Type nll;
nll=0;
Type var;
var = p*(1-p);
var = posfun(var, eps, pen);
nll += pen;
nll += -sum( dnorm(x, p, sqrt(var),true) );
nll += -dnorm(Dummy, Type(0.0), Type(1.0), true);
return nll;
}
@bbolker I have trouble reproducing the segfault on 32 bit ubuntu with R-3.0.1. Could you confirm that you get a crash by 1. checking out the branch 'posfun'. 2. install TMB from that branch. 3. Run posfun example from tmb_examples ?
hmm, can't reproduce now ... under
R Under development (unstable) (2015-07-29 r68752)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu precise (12.04.5 LTS)
[snip]
other attached packages:
[1] TMB_1.1-1
loaded via a namespace (and not attached):
[1] compiler_3.3.0 Matrix_1.2-2 tools_3.3.0 grid_3.3.0
[5] lattice_0.20-33
Was this ever resolved? I'm having trouble with this myself in state-space Schafer model context similar to Jim Thorson's problem (my working code is not suitable as an example, if this spurs a discussion I'll create a minimal example).
I don't get a segfault, just NA/NaN gradient evaluations that stop the optimisation during the integration of my process error terms, which posfun()
is supposed to avoid.
Also, I'm not sure we want to use the denominator in the expression eps/(Type(2)-x/eps)
that is returned for the case where x < eps
, as this will return a value lower than the threshold eps
.
Coming back here to post a link to the ADMB posfun documentation, and to explain a little bit more about what posfun()
does/is supposed to do (since I had to figure it out for myself). The point of eps
is not to guarantee that all values are greater than eps
; it's to guarantee that values above eps
are unchanged, and values between eps
and -infinity are smoothly adjusted to be between 0 and eps.
May I suggest that this function be added as built-in utility in TMB? (Fancier versions could include (i) a bound other than zero (ii) a 'sign' (greater than/less than) switch (iii) an adjustable penalty (instead of a hard-coded value of 0.01) ...) I could submit a PR if requested ...
curve(ifelse(x<eps,eps/(2-x/eps),x),-2e-3,2e-3)
abline(v=c(0,eps),lty=2)
abline(a=0,b=1,lty=3)
@bbolker I'm sceptical about posfun. It seems to cause trouble when used to transform random effects (based on user reports) - and I'm not surprised: The second order derivative of posfun is not continuous. If used with the Laplace approximation one would expect that each call to posfun adds new discontinuities to the marginal likelihood. So, no - I don't think posfun should be in the main TMB distribution.
Fair enough. I suppose there might be a way to do this with continuous second derivatives ...
Yes, there should be many ways to construct something similar to posfun with better smoothness properties. A quick shot:
f <- function(x)eps*log(exp(x/eps)+1)
I'm being super-lazy, but it seems there might be a typo in your new version?
source_gist("gist.github.com/bbolker/fea73a45b2f8c8e183663af566337b53")
This shows the function, first and second derivative ...
@bbolker
I should have been more precise with 'something similar to posfun'. I'm assuming the purpose of posfun is to approximate the positive part max(0,x)
by a smooth monotone function with an error controlled by eps
. For this purpose my f
above works just as well as posfun
. However, it does not hold exact that f(x)=x
for all x>eps
(in contrast to posfun).
I think the new suggestion looks interesting and when implementing it, it could be made more robust for large values by using logspace_add (the function already in TMB).
Example in R:
logspace_add <- function(a,b) pmax(a, b) + log1p(exp(pmin(a,b)-pmax(a, b))) f <- function(x, eps=0.001)eps*logspace_add(x/eps,0) library(numDeriv) Df <- function(x)grad(f,x) DDf <- function(x)grad(Df,x) par(mfrow=c(1,3)) plot(f,-0.01,0.01, main=0) plot(Df,-.01,0.01, main=1) plot(DDf,-.01,0.01, main=2)
There are indeed many ways. If your are willing to tolerate a third order polynomial between eps and zero, you can determine the values of the necessary coefficients by assuming the second derivative is zero at eps.
On 6/10/20 10:15 AM, kaskr wrote:
Yes, there should be many ways to construct something similar to posfun with better smoothness properties. A quick shot:
|f <- function(x)eps*log(exp(x/eps)+1) |
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaskr/adcomp/issues/7#issuecomment-642145449, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJQG4WHUWHOEE5CVLGYTKLRV65TLANCNFSM4AM7H5MQ.
Just back to mention that a TMB version of @fishfollower's suggestion above
template<class Type>
Type posfun(Type x, Type eps, Type &pen) {
pen += CppAD::CondExpLt(x,eps,Type(0.01)*pow(x-eps,2),Type(0));
return CppAD::CondExpGe(x,eps,x,eps*logspace_add(x/eps,Type(0)));
}
works fine with @James-Thorson's example above, even with random effects (presumably because the second derivative is now smooth).
I understood @kaskr's suggestion as a way to avoid the conditional expressions, so we could simply use:
template<class Type>
Type posfunx(Type x, Type eps){
return eps*logspace_add(x/eps,Type(0));
}
Does that not work?
OK, I misunderstood. I'm looking for a function with the following properties:
I now realize that @fishfollower was just refining @kaskr's suggestion, which instead of criterion 3 substitutes " f(x) ≈ x for x > eps (or, f(x) → eps as x → infinity)".
A more general question: where is everyone's intuition/knowledge coming from on these topics? I imagine that if I thought about it hard enough I could generalize the original 'posfun' expression (presumably originally from Dave Fournier?) by writing down a quadratic analogue, or I could figure out a cubic polynomial on the log scale that interpolated between f(eps)=x, f'(eps)=1, f''(eps)=0 and f(0) = ?, f(0) = -1, f''(0)=0 ? (The rational-function solution might be preferable since it would have a slower rate of decrease ... ?)
OK, I worked it out. If we use eps / sum( (-1)^i*(x-eps)^i/eps^i)
for x<eps
and x
for x>eps
, if the summation runs from i=0 to i=n, then we have continuous derivatives up to order n. Is there any point to going higher than n=2
... ?
Strictly speaking, the gradient of the Laplace approximation requires n=3
and the Hessian requires n=4
.
This function has continuous fifth derivatives. Could drop the last term if we only want n=4. I implemented it this way because I wanted to be able to use R's D()
function (which wouldn't work on a summation), but there are obviously plenty of more efficient ways to implement this for use (if desired) within TMB ...
f <- function(x,eps=0.001) {
eps*(1/(1-(x-eps)/eps + (x-eps)^2/eps^2 - (x-eps)^3/eps^3
+ (x-eps)^4/eps^4 - (x-eps)^5/eps^5))
}
I might go ahead and make an example implementation of this. Should I do the whole TMB_ATOMIC_VECTOR_FUNCTION
thing, or just code it as a simple function for now?
@bbolker OK, that looks like an improvement of the original posfun.
Reading your pdf document I was reminded of the fact that the original posfun adds a penalty to nll. That makes sense when box-constraining model parameters. However, for random effects it's not that simple... We must make sure that our specified model density (corresponding to nll) without data integrates to one, right? However, it seems to me that adding penalties messes up the normalizing constant.
Before considering implementation details of posfun, perhaps it's worth demonstrating on a simple simulation-reestimation example that you can make box constraints work on random effects using posfun? (I doubt it will work... Am I missing something?)
Getting back to this ... what kind of example do you suggest? If the constraints aren't binding at the optimal solution (i.e. if the main purpose of posfun
is to get us through/around bad patches of parameter space), then it should be OK ... @James-Thorson's example above "works", but I think the main purpose of that example was to show that the old posfun
was grossly broken for random effects (because of the derivative characteristics), not that it would give a good/correct answers when the constraints were binding ...
@bbolker That is exactly my problem: I cannot imagine any random effect examples for which posfun is useful!
If the constraints aren't binding at the optimal solution (i.e. if the main purpose of posfun is to get us through/around bad patches of parameter space), then it should be OK ...
TMB:::newton
.Example: Consider integrating dnorm(x) * (x > -1)
using the Laplace approximation corresponding to a 'bad patch' x < -1
coded up using posfun
. The constraint is non-binding so the Laplace approximation is the same as that of dnorm(x)
- i.e. obviously wrong. I could have picked any other 'bad patch' on negative reals and the result would be the same. How do you decide whether the patch is insignificant in general? And what did I gain in my example above by using posfun to handle the constraint?
Did any solve this issue for the state-space theta-logistic model using posfun with lognormal process error? I am using it and having convergence issues. The reason posfun is needed is because it is catch conditioned (catch is removed as known). Is using posfun with random effects just a bad idea and you have to estimate F as a random effect and fit to catch.
@MarkMaunder
Is using posfun with random effects just a bad idea
Yes!
The reason posfun is needed is because it is catch conditioned (catch is removed as known)
If I understand the problem correctly, there's a constraint on the state variable log(B)
in that current catch must be smaller than current biomass C < B
. Couldn't that be handled by transforming the state variable?
U := log(B-C)
(BTW: This discussion is more appropriate for tmb-users list...)
Hi all,
Just a quick note that many state-space implementations require a function similar to "posfun" in ADMB, where a scalar of class Type is returned unchanged if above a certain threshold, but is returned at the threshold if below the threshold, and where the objective function is modified to penalize instances where it is below the threshold.
For example, an implementation of a state-space theta-logistic model requires posfun if using a lognormal process error (rather than the normal process error here: https://groups.nceas.ucsb.edu/non-linear-modeling/projects/theta/ADMB/theta.tpl)
Cheers, Jim