gslab-econ / gslab_r

5 stars 1 forks source link

Development NumericalDerivatives #7

Closed yuchuan2016 closed 7 years ago

yuchuan2016 commented 7 years ago

@arosenbe , could you kindly take a look at this? This package is simply a translation of this MATLAB library. It should be straightforward. Thanks!

yuchuan2016 commented 7 years ago

@arosenbe , thanks for the comments! I have implemented all sugegstions. Could you take a look? About the check of empty function argument, I'm not sure what's the parallel test in R. I don't know how to create an empty function. The closest I can think of is function() NULL. Should I test something like is.null(func())?

arosenbe commented 7 years ago

Thanks @yuchuan2016. I like the new sapply syntax a lot! On the empty function check, I think you were initially correct in ignoring it (don't know why I brought it up). Nobody can take the Jacobian of a nonexistent function.

I have one more comment: All arguments should be passed explicitly to f. We shouldn't rely on the parent scope to fill them in for us.

yuchuan2016 commented 7 years ago

@arosenbe , thanks! I don't quite understand your last comment. Could you explain it a little? Thanks a lot!

arosenbe commented 7 years ago

Yeah, definitely @yuchuan2016! The inner f function takes j as an argument but (at least in numJacob) also depends on x0, xTol, paramdim, and func. The current implementation passes j to f explicitly in the sapply statement, but the other dependencies are filled implicitly by f checking its parent scope (the numJacob function). If the variables weren't found there, then f would check numJacob's parent scope. This can be problematic, because we might end up filling these implicit dependencies with values different from the ones we intended.

A solution is to enumerate these dependencies explicitly as arguments in the f function. These arguments can be filled in the sapply call as arguments after f. A simple example of this is below. Let me know if you need a different description or want to talk in person.

# Good practice (pass y explicitly)
f_good <- function(x, y){
  out <- c(x, y)
  return(out)
}

out_good <- sapply(1:10, f_good, 0)

# Bad practice (check parent scope for implicit dependency on `y`)
f_bad <- function(x){
  out <- c(x, y)
  return(out)
}

y <- 0
out_bad <- sapply(1:10, f_bad)

# Check for equality
all.equal(out_good, out_bad) # True
yuchuan2016 commented 7 years ago

@arosenbe , thanks! Your explanation is very clear. I have updated the relevant part.

arosenbe commented 7 years ago

Looking better @yuchuan2016! I especially love that the way that the argument positions line up. I do have another comment (sorry!) about your implementation of my suggestion above https://github.com/gslab-econ/gslab_r/pull/7#issuecomment-314880992.

It looks like you moved some steps that only need to be done once (e.g., function evaluation at base argument values) into the inner f function. This results in them being evaluated length(x0) times, which could result in a considerable loss of speed if the function's called a lot, which I expect them to be. It'd be great if you could move these steps outside the f function and just pass their output into the f function.

yuchuan2016 commented 7 years ago

@arosenbe , very good point! Thanks. I have modified the function.

arosenbe commented 7 years ago

Thanks @yuchuan2016! I'm happy with the format of the code now. I wonder if we don't want to have some error checking though. I was playing around with the code and was surprised by the result below.

# Assumes you've already loaded numJacob
f <- function(x) x[1]^2 + x[2]^3 + x[3]^4
out <- NumericalDerivatives::numJacob(f, c(1, 2), .5)
print(out)
# [1] NA NA

I think this occurs because f(x0) evaluates to NA when too few arguments are passed. Oppositely, when there are too many, we get weird stuff like this

f <- function(x) x[1]^2 + x[2]^3 + x[3]^4
out <- NumericalDerivatives::numJacob(f, c(1, 2, 3, 4), .5)
print(out)
# [1]   2.500  19.000 219.375   0.000

If the Matlab implementation was fine with these edge cases, then I think we can ignore them. Could you check and report back?

yuchuan2016 commented 7 years ago

@arosenbe , good point! I think the second case is fine. It just means the function does not depend on the fourth argument, so the derivative is 0. The MATLAB implementation also return a 1*4 matrix.

For the first case, MATLAB function will raise an error "Index exceeds matrix dimensions". The difference comes from that in MARLAB, you have

x=[1;2]; x(3) Index exceeds matrix dimensions.

While in R, you have

x <- c(1, 2) x[3] [1] NA

Can we raise an error if f(x0) is evaluated to be "NA"?

arosenbe commented 7 years ago

I think that's a great solution @yuchuan2016.

arosenbe commented 7 years ago

@yuchuan2016 I was taking a closer look at numHess, and I have three comments

f <- function(x) x[1]^2 + x[2]^3 + x[3]^4

x0 <- c(1, 2, 3)
ind_colvar <- length(x0) + 1 # same for row

out <- NumericalDerivatives::numHess(f, x0, .5, ind_colvar = ind_colvar)

print(out)
warnings()
f <- function(x) x[1]^2 + x[2]^3 + x[3]^4

x0 <- c(1, 2, 3)

out_row <- NumericalDerivatives::numHess(f, x0, .5, ind_rowvar = 1)
is.matrix(out) # False

out_col <- NumericalDerivatives::numHess(f, x0, .5, ind_colvar = 1)
is.matrix(out) # True
yuchuan2016 commented 7 years ago

@arosenbe , thanks! I incorporate your comments. Could you take a look?

arosenbe commented 7 years ago

Sorry for my delay in getting back to you @yuchuan2016! Your commit above does address my comments. However, I think it introduces a new bug in numHess. The sapply function seems to concatenate entire matrices when the target function returns a vector.

fun_vctr <- function(x){
  out <- c(x[1]^2, x[2]^2, x[3]^2)
}

fun_sclr <- function(x){
  out <- c(x[1]^2 + x[2]^2 + x[3]^2)
}

dim(numHess(fun_sclr, c(2, 3, 4), .5)) # Square
dim(numHess(fun_vctr, c(2, 3, 4), .5)) # Not square
yuchuan2016 commented 7 years ago

@arosenbe , thanks! The MATLAB version does not allow a vector func. If you pass a vector function, it will throw an error

Assignment has more non-singleton rhs dimensions than non-singleton subscripts

since we assign hess(i,j) a vector value.

Do you think we want to allow numHess to handle a vector-value function? If so, should the output be a three-dimension array, and the length of the first dimension is the length of function output? Or we can also raise an error if length(f0)>0, as the original function?

arosenbe commented 7 years ago

Good thought going back to the MATLAB @yuchuan2016! I think we want to be as faithful to that implementation as possible. So let's raise an error if length(f0)>1 for numHess. I do think we should add this to the documentation. Maybe something like

The function numerically calculates the Hessian matrix of a given ~vector~scalar-valued function.

Do you know if this same requirement is put on numJacob?

yuchuan2016 commented 7 years ago

@arosenbe , actually numJacob allows vector-valued functions. It returns a row vector for a scalar-valued function, and a matrix for a vector-valued function with the ith row corresponding to the ith element of the function output. The current R function does the same thing.

Do you mean "a given scalar-valued function" in your previous comment?

arosenbe commented 7 years ago

Yes I did @yuchuan2016.

yuchuan2016 commented 7 years ago

@arosenbe , thanks! I'm going to merge to master if no other issues arise.