Error: Invalid grouping factor specification

jeffreyhanson commented 7 years ago

Hi,

I'm receiving this error when I run MS_optim.

Do you have any suggestions as to why this might be happening?

ResistanceGA::MS_optim(gdist.inputs = gdist_settings[[i]],
+                          GA.inputs = ga_settings[[i]])

GA | Iter = 1  | Mean = 1652.224  | Best = 1770.838 
GA | Iter = 2  | Mean = 1668.548  | Best = 1770.838 
GA | Iter = 3  | Mean = 1674.399  | Best = 1770.838 
GA | Iter = 4  | Mean = 1678.925  | Best = 1770.838 
GA | Iter = 5  | Mean = 1682.273  | Best = 1770.838 
GA | Iter = 6  | Mean = 1690.189  | Best = 1770.838 
GA | Iter = 7  | Mean = 1692.892  | Best = 1779.38 
GA | Iter = 8  | Mean = 1697.815  | Best = 1779.38 
GA | Iter = 9  | Mean = 1699.563  | Best = 1781.473 
Error in { : 
  task 16 failed - "Invalid grouping factor specification, pop1"

Here's the traceback:

5: stop(simpleError(msg, call = expr))
4: e$fun(obj, substitute(ex), parent.frame(), e$data)
3: foreach(i. = seq_len(popSize), .combine = "c") %DO% {
       if (is.na(Fitness[i.])) 
           fitness(Pop[i., ], ...)
       else Fitness[i.]
   }
2: ga(type = "real-valued", fitness = Resistance.Opt_multi, population = GA.inputs$population, 
       selection = GA.inputs$selection, mutation = GA.inputs$mutation, 
       pcrossover = GA.inputs$pcrossover, crossover = GA.inputs$crossover, 
       pmutation = GA.inputs$pmutation, Min.Max = GA.inputs$Min.Max, 
       GA.inputs = GA.inputs, gdist.inputs = gdist.inputs, min = GA.inputs$ga.min, 
       max = GA.inputs$ga.max, popSize = GA.inputs$pop.size, maxiter = GA.inputs$maxiter, 
       parallel = GA.inputs$parallel, run = GA.inputs$run, keepBest = GA.inputs$keepBest, 
       seed = GA.inputs$seed, suggestions = GA.inputs$SUGGESTS, 
       quiet = GA.inputs$quiet)
1: ResistanceGA::MS_optim(gdist.inputs = gdist_settings[[i]], GA.inputs = ga_settings[[i]])

Here's my session information:

R version 3.3.3 (2017-03-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C     

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_1.5 gurobi_7.0-2 slam_0.1-40 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10       GA_3.0.2           compiler_3.3.3     nloptr_1.0.4      
 [5] plyr_1.8.4         iterators_1.0.8    tools_3.3.3        lme4_1.1-12       
 [9] packrat_0.4.8-1    tibble_1.2         gtable_0.2.0       gdistance_1.2-1   
[13] nlme_3.1-131       lattice_0.20-34    Matrix_1.2-8       foreach_1.4.3     
[17] igraph_1.0.1       DBI_0.6            parallel_3.3.3     rgdal_1.2-5       
[21] akima_0.6-2        dplyr_0.5.0        raster_2.5-8       stats4_3.3.3      
[25] grid_3.3.3         smoothie_1.0-1     R6_2.2.0           session_1.0.3     
[29] sp_1.2-4           minqa_1.2.4        ResistanceGA_3.0-2 ggplot2_2.2.1     
[33] scales_0.4.1       codetools_0.2-15   splines_3.3.3      MASS_7.3-45       
[37] assertthat_0.1     colorspace_1.3-2   doParallel_1.0.10  lazyeval_0.2.0    
[41] munsell_0.4.3      RcppTOML_0.1.1     MuMIn_1.15.6

jeffreyhanson commented 7 years ago

I'm running it again without parallel processing to see if I can get a more helpful traceback.

jeffreyhanson commented 7 years ago

Here's the traceback without parallization

12: stop("Invalid grouping factor specification, ", deparse(x[[3]]), 
        call. = FALSE)
11: FUN(X[[i]], ...)
10: lapply(bars, mkBlist, fr, drop.unused.levels)
9: mkReTrms(findbars(RHSForm(formula)), fr)
8: lFormula(response ~ scale(resistance) + (1 | pop1), data = dat, 
       REML = REML)
7: MLPE.lmm2(resistance = cd, response = gdist.inputs$response, 
       ID = gdist.inputs$ID, ZZ = gdist.inputs$ZZ, REML = FALSE)
6: logLik(MLPE.lmm2(resistance = cd, response = gdist.inputs$response, 
       ID = gdist.inputs$ID, ZZ = gdist.inputs$ZZ, REML = FALSE))
5: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning"))
4: suppressWarnings(logLik(MLPE.lmm2(resistance = cd, response = gdist.inputs$response, 
       ID = gdist.inputs$ID, ZZ = gdist.inputs$ZZ, REML = FALSE)))
3: fitness(Pop[i, ], ...)
2: ga(type = "real-valued", fitness = Resistance.Opt_multi, population = GA.inputs$population, 
       selection = GA.inputs$selection, mutation = GA.inputs$mutation, 
       pcrossover = GA.inputs$pcrossover, crossover = GA.inputs$crossover, 
       pmutation = GA.inputs$pmutation, Min.Max = GA.inputs$Min.Max, 
       GA.inputs = GA.inputs, gdist.inputs = gdist.inputs, min = GA.inputs$ga.min, 
       max = GA.inputs$ga.max, popSize = GA.inputs$pop.size, maxiter = GA.inputs$maxiter, 
       parallel = GA.inputs$parallel, run = GA.inputs$run, keepBest = GA.inputs$keepBest, 
       seed = GA.inputs$seed, suggestions = GA.inputs$SUGGESTS, 
       quiet = GA.inputs$quiet)
1: ResistanceGA::MS_optim(gdist.inputs = gdist_settings[[i]], GA.inputs = ga_settings[[i]])

wpeterman commented 7 years ago

Hi Jeff--

Not sure what the issues are. Are these issues occurring with the example data, or your own data? I cannot recreate the error on my Windows machine with the example data (in parallel or serial). If you're using your own data, and are willing to pass it along, I'd be happy to troubleshoot on my end.

Bill Peterman Assistant Professor School of Environment and Natural Resources The Ohio State University 2021 Coffey Road Columbus, OH 43210-1085 Phone: 614.292.9795 Web Page https://goo.gl/4Gc15W Terrestrial Wildlife Ecology Lab http://www.twel.osu.edu

Bill Peterman

On Fri, Apr 21, 2017 at 1:17 AM, Jeff Hanson notifications@github.com wrote:

Here's the traceback without parallization

12: stop("Invalid grouping factor specification, ", deparse(x[[3]]), call. = FALSE) 11: FUN(X[[i]], ...) 10: lapply(bars, mkBlist, fr, drop.unused.levels) 9: mkReTrms(findbars(RHSForm(formula)), fr) 8: lFormula(response ~ scale(resistance) + (1 | pop1), data = dat, REML = REML) 7: MLPE.lmm2(resistance = cd, response = gdist.inputs$response, ID = gdist.inputs$ID, ZZ = gdist.inputs$ZZ, REML = FALSE) 6: logLik(MLPE.lmm2(resistance = cd, response = gdist.inputs$response, ID = gdist.inputs$ID, ZZ = gdist.inputs$ZZ, REML = FALSE)) 5: withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) 4: suppressWarnings(logLik(MLPE.lmm2(resistance = cd, response = gdist.inputs$response, ID = gdist.inputs$ID, ZZ = gdist.inputs$ZZ, REML = FALSE))) 3: fitness(Pop[i, ], ...) 2: ga(type = "real-valued", fitness = Resistance.Opt_multi, population = GA.inputs$population, selection = GA.inputs$selection, mutation = GA.inputs$mutation, pcrossover = GA.inputs$pcrossover, crossover = GA.inputs$crossover, pmutation = GA.inputs$pmutation, Min.Max = GA.inputs$Min.Max, GA.inputs = GA.inputs, gdist.inputs = gdist.inputs, min = GA.inputs$ga.min, max = GA.inputs$ga.max, popSize = GA.inputs$pop.size, maxiter = GA.inputs$maxiter, parallel = GA.inputs$parallel, run = GA.inputs$run, keepBest = GA.inputs$keepBest, seed = GA.inputs$seed, suggestions = GA.inputs$SUGGESTS, quiet = GA.inputs$quiet) 1: ResistanceGA::MS_optim(gdist.inputs = gdist_settings[[i]], GA.inputs = ga_settings[[i]])

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wpeterman/ResistanceGA/issues/3#issuecomment-296070179, or mute the thread https://github.com/notifications/unsubscribe-auth/AEa4UaKgnuAvO95PWRovAdzA0kTITUqnks5ryDvqgaJpZM4ND1XX .

jeffreyhanson commented 7 years ago

Sure, here's the data and the code that should reproduce the error. However, I haven't found a good seed that triggers the error early on in the optimization, so it takes a long time before it triggers the error.

I've been exploring the traceback using the debugger, and it seems the error is raised when:

The resistance argument in MLPE.lmm2 is a vector containing only -Inf values.
This is caused by the combined surfaces layer (line 57 in OptimFunction_Multi.R) containing only pixels with Inf resistance.
This is caused by the Inv.Rev.Ricker function transforming a RasterLayer object to contain Inf values.
This is caused by the Inv.Ricker function transforming a RasterLayer object to contain Inf values.
Inside Inv.Ricker the argument to parm (2.931952420, 0.006754924 and 691.755466494) is used to transform the raster with a minimum value of 0 and a maximum value of 10 (line 50 in Transformation_functions.R). This line returns a RasterLayer with a minimum value of -1 and a maximum value of -1.
This RasterLayer with the same minima and maxima are then rescaled using SCALE (line 51 in Transformation_functions.R) to return a Rasterlayer with Inf values.

This behaviour in Inv.Rev.Ricker can be reproduced using the code:

r <- raster(matrix(0:8, ncol=3))
parm <- c(2.931952420, 0.006754924, 691.755466494)
r2 <- Inv.Rev.Ricker(r, parm)
cellStats(r2, "max")
# - Inf

From what I understand, the four ways to avoid this issue are:

prevent the genetic algorithm from generating parameter values that result in Inf/NA values being output from Inv.Rev.Ricker. This seems like it would be tricky? Is this one of the input arguments for GA.prep?
remove the Inv.Rev.Ricker function - this is clearly undesirable.
modify the Inv.Rev.Ricker function to handle certain parameter combinations. However, this doesn't help us if other transformation functions also suffer from the same issue.
modify the SCALE function to deal with data that has the same minima and maxima.

May I suggest explicitly modifying the SCALE function to return data containing zeros when the same arguments are supplied for MIN and MAX (or close enough to being the same given a level of precision)? For example, something like this?

# Rescale function
SCALE.vector <- function(data, MIN, MAX, threshold = 1e-5) {
  if (abs(MIN - MAX) < threshold) {
    data[is.finite(data)] <- 0
    data
  } else {
    Mn = min(data)
    Mx = max(data)
    (MAX - MIN) / (Mx - Mn) * (data - Mx) + MAX
  }
}

# Define scaling function
# This will rescale from 1 to specified MAX
SCALE <- function(data, MIN, MAX, threshold = 1e-5) {
  if (abs(MIN - MAX) < threshold) {
    data[is.finite(raster::values(data))] <- 0
    data
  } else {
    Mn = cellStats(data, stat = 'min')
    Mx = cellStats(data, stat = 'max')
    (MAX - MIN) / (Mx - Mn) * (data - Mx) + MAX
  }
}

When using this definition for SCALE, the problematic code I included earlier, returns a value of zero for the maximum value in the r2.

What do you think? Can I provide any other information?

If you're happy with these suggestions, I can submit a pull request with these changes.

wpeterman commented 7 years ago

Hi Jeff--

Thanks for taking so much time to track this down. I agree that the most straight-forward solution is to modify the SCALE function, and your solution is a great fix.

Best--Bill

Bill Peterman Assistant Professor School of Environment and Natural Resources The Ohio State University 2021 Coffey Road Columbus, OH 43210-1085 Phone: 614.292.9795 Web Page https://goo.gl/4Gc15W Terrestrial Wildlife Ecology Lab http://www.twel.osu.edu

Bill Peterman

On Sat, Apr 22, 2017 at 11:13 PM, Jeff Hanson notifications@github.com wrote:

Sure, here's the data and the code that should reproduce the error https://github.com/wpeterman/ResistanceGA/files/949442/debug.zip. However, I haven't found a good seed that triggers the error early on in the optimization, so it takes a long time before it triggers the error.

I've been exploring the traceback using the debugger, and it seems the error is raised when:

The resistance argument in MLPE.lmm2 is a vector containing only -Inf values.

This is caused by the combined surfaces layer (line 57 in OptimFunction_Multi.R) containing only pixels with Inf resistance.

This is caused by the Inv.Rev.Ricker function transforming a RasterLayer object to contain Inf values.

This is caused by the Inv.Ricker function transforming a RasterLayer object to contain Inf values.

Inside Inv.Ricker the argument to parm (2.931952420, 0.006754924 and 691.755466494) is used to transform the raster with a minimum value of 0 and a maximum value of 10 (line 50 in Transformation_functions.R). This line returns a RasterLayer with a minimum value of -1 and a maximum value of -1.

This RasterLayer with the same minima and maxima are then rescaled using SCALE (line 51 in Transformation_functions.R) to return a Rasterlayer with Inf values.

This behaviour in Inv.Rev.Ricker can be reproduced using the code:

r <- raster(matrix(0:8, ncol=3)) parm <- c(2.931952420, 0.006754924, 691.755466494) r2 <- Inv.Rev.Ricker(r, parm) cellStats(r2, "max")

- Inf

From what I understand, the four ways to avoid this issue are:

prevent the genetic algorithm from generating parameter values that result in Inf/NA values being output from Inv.Rev.Ricker. This seems like it would be tricky? Is this one of the input arguments for GA.prep ?

remove the Inv.Rev.Ricker function - this is clearly undesirable.

modify the Inv.Rev.Ricker function to handle certain parameter combinations. However, this doesn't help us if other transformation functions also suffer from the same issue.

modify the SCALE function to deal with data that has the same minima and maxima.

May I suggest explicitly modifying the SCALE function to return data containing zeros when the same arguments are supplied for MIN and MAX (or close enough to being the same given a level of precision)? For example, something like this?

Rescale function

SCALE.vector <- function(data, MIN, MAX, threshold = 1e-5) { if (abs(MIN - MAX) < threshold) { data[is.finite(data)] <- 0 data } else { Mn = min(data) Mx = max(data) (MAX - MIN) / (Mx - Mn) * (data - Mx) + MAX } }

Define scaling function

This will rescale from 1 to specified MAX

SCALE <- function(data, MIN, MAX, threshold = 1e-5) { if (abs(MIN - MAX) < threshold) { data[is.finite(raster::values(data))] <- 0 data } else { Mn = cellStats(data, stat = 'min') Mx = cellStats(data, stat = 'max') (MAX - MIN) / (Mx - Mn) * (data - Mx) + MAX } }

When using this definition for SCALE, the problematic code I included earlier, returns a value of zero for the maximum value in the r2.

What do you think? Can I provide any other information?

If you're happy with these suggestions, I can submit a pull request with these changes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wpeterman/ResistanceGA/issues/3#issuecomment-296416536, or mute the thread https://github.com/notifications/unsubscribe-auth/AEa4Uftl2OmSZZOG8OKhl82bDjPQHugDks5rysHbgaJpZM4ND1XX .

wpeterman / ResistanceGA

Error: Invalid grouping factor specification #3

- Inf

Rescale function

Define scaling function

This will rescale from 1 to specified MAX