luca-scr / GA

An R package for optimization using genetic algorithms
http://luca-scr.github.io/GA/
91 stars 29 forks source link

GA crashes #15

Closed rbecerril closed 6 years ago

rbecerril commented 6 years ago

Hi Luca,

First of all, thank you for sharing and maintaining this very useful package. Second, recently I started experiencing problems with it. It used to work fine but then I changed my data. Now, after a number of iterations, (see this file ga_monitor__M130_m18q1_2.pdf ), I get this message:

Error in { : task 31 failed - "replacement has length zero"
Calls: ga -> %DO% -> <Anonymous>
Execution halted

I am using GA version 3.0.2, on R version 3.4.1 (2017-06-30) -- "Single Candle" running it on 15 cores of a computing cluster with platform: x86_64-pc-linux-gnu (64-bit)

The call in my program is

pdf(file = sprintf("ga_monitor_%s.pdf", filesuff))
assortmentOpt = ga(type="binary", fitness=expected_profit, 
                   assortmentmap=assortmentmap, N0l=N, M0l=M, Nl=Nfocal, Ml=Mfocal, Kl=K, Dl=D, 
                   maxnKl=maxnK, cindl=cindfocal, tauVl=tauVfocal, thetaVl=thetaVfocal, 
                   Xl=Xfocal, marginsVl=focaldata$margins, suggestions=initialConds,
                   nBits = nrow(assortmentbase), names=assortmentbase$key, pmutation = 0.25,
             pcrossover = 0.75, seed = 1234,
                   maxiter = maxOptiter, parallel=runparallel, monitor=plot) 
dev.off()

the fitness function is

# compute profit for the entire chain
expected_profit = function(assortment, assortmentmap,
                           N0l, M0l, Nl, Ml, Kl, Dl, maxnKl, cindl, 
                           tauVl, thetaVl, Xl, marginsVl ) {

  # compute profit specific to a draw in the chain of estimates
  draw_profit = function(Nl, Ml, Kl, nKl, cindl, aindl, taul, thetal, Xl, marginsl ) {

    expu = matrix(NA, nrow = Nl, ncol = Kl)
    expuMargin = matrix(NA, nrow = Nl, ncol = Kl)
    # compute utilities for available alternatives
    for (n in 1:Nl){
      for (k in 1:nKl[n]){
        expu[n,k] <- exp( taul[cindl[n],aindl[n,k]] + 
                            t(Xl[n,aindl[n,k],]) %*% thetal[cindl[n],] + logiterrors[n,k])
        expuMargin[n,k] = expu[n,k] * marginsl[n,aindl[n,k]]
        # if price coeff is negative, omit observation
        if (thetal[cindl[n],1]<0) expuMargin[n,k]=0 
      }
    }
    expectedProfit = sum(apply(expuMargin,1,sum, na.rm=TRUE) / apply(expu,1,sum, na.rm=TRUE))
    return(expectedProfit)
  }

  ######################################
  ## RECONSTRUCT DATA STRUCTURES
  ######################################

  # expand gene to span all transactions, rows are observatinos, columns are alternatives
  tmp = matrix(assortment[assortmentmap], ncol=Kl, nrow=Nl, byrow=TRUE)
  maxnKl = max(apply(tmp,1,function(x) sum(!is.na(x))))
  aindl = matrix(0, ncol=maxnKl, nrow=Nl)
  nKl = numeric(Nl)
  if (sum(tmp==0)>0) {
    # contruct list of available alternatives for each observation
    for (n in 1:Nl) {
      tmp2 = which(tmp[n,]==1)
      aindl[n, 1:length(tmp2)] = tmp2
      nKl[n] = length(tmp2)
    }
  } else {
    aindl = matrix(rep(1:Kl,Nl), ncol = Kl, nrow=Nl, byrow=TRUE)
    nKl = rep(K,Nl)
  }  

  marginsl = array(NA,c(Nl,Kl))
  for (n in 1:Nl){
    indices = ((n-1)*Kl+1):(n*Kl)
    marginsl[n,] = marginsVl[indices]
  }

  ######################################
  ## COMPUTE PROFITS
  ######################################
  profit = numeric(usedraws)
  for (draw in 1:usedraws){
    thetal = matrix(as.numeric(thetaVl[draw,]), nrow = Ml, ncol=Dl, byrow=FALSE)
    taul= matrix(as.numeric(tauVl[draw,]), nrow=Ml, ncol=Kl,  byrow=FALSE)
    profit[draw] = draw_profit(Nl, Ml, Kl, nKl, cindl, aindl, taul, thetal, Xl, marginsl )
  }                           

  tmp = sum(profit, na.rm=TRUE)
  if (length(tmp)==0 | is.na(tmp) | is.nan(tmp) | is.infinite(tmp))
    return(0)
  else
    return(tmp)  
}

So I am avoiding invalid values for the fitness function.

If you could offer any insight on what may be happening, I would greatly appreciate it.

Thanks in advance

Rafael

luca-scr commented 6 years ago

Hi,

it is difficult to say what's happening. Your case is too complicated to follow and I don't have time to do it. I may eventually look at a minimal reproducible example (data+code https://en.wikipedia.org/wiki/Minimal_Working_Example).

First I would look at the following:

The main idea of the above step is to isolate each step to make sure the problem is not in the above part.

All the best,

Luca

On 5 May 2018, at 20:54, rbecerril notifications@github.com wrote:

Hi Luca,

First of all, thank you for sharing and maintaining this very useful package. Second, recently I started experiencing problems with it. It used to work fine but then I changed my data and now I am getting this message:

Error in { : task 31 failed - "replacement has length zero" Calls: ga -> %DO% -> Execution halted

I am using GA version 3.0.2, on R version 3.4.1 (2017-06-30) -- "Single Candle" running it on 15 cores of a computing cluster with platform: x86_64-pc-linux-gnu (64-bit)

The call in my program is

pdf(file = sprintf("gamonitor%s.pdf", filesuff)) assortmentOpt = ga(type="binary", fitness=expected_profit, assortmentmap=assortmentmap, N0l=N, M0l=M, Nl=Nfocal, Ml=Mfocal, Kl=K, Dl=D, maxnKl=maxnK, cindl=cindfocal, tauVl=tauVfocal, thetaVl=thetaVfocal, Xl=Xfocal, marginsVl=focaldata$margins, suggestions=initialConds, nBits = nrow(assortmentbase), names=assortmentbase$key, pmutation = 0.25, pcrossover = 0.75, seed = 1234, maxiter = maxOptiter, parallel=runparallel, monitor=plot) dev.off()

the fitness function is

compute profit for the entire chain

expected_profit = function(assortment, assortmentmap, N0l, M0l, Nl, Ml, Kl, Dl, maxnKl, cindl, tauVl, thetaVl, Xl, marginsVl ) {

compute profit specific to a draw in the chain of estimates

draw_profit = function(Nl, Ml, Kl, nKl, cindl, aindl, taul, thetal, Xl, marginsl ) {

expu = matrix(NA, nrow = Nl, ncol = Kl)
expuMargin = matrix(NA, nrow = Nl, ncol = Kl)
# compute utilities for available alternatives
for (n in 1:Nl){
  for (k in 1:nKl[n]){
    expu[n,k] <- exp( taul[cindl[n],aindl[n,k]] + 
                        t(Xl[n,aindl[n,k],]) %*% thetal[cindl[n],] + logiterrors[n,k])
    expuMargin[n,k] = expu[n,k] * marginsl[n,aindl[n,k]]
    # if price coeff is negative, omit observation
    if (thetal[cindl[n],1]<0) expuMargin[n,k]=0 
  }
}
expectedProfit = sum(apply(expuMargin,1,sum, na.rm=TRUE) / apply(expu,1,sum, na.rm=TRUE))
return(expectedProfit)

}

######################################

RECONSTRUCT DATA STRUCTURES

######################################

expand gene to span all transactions, rows are observatinos, columns are alternatives

tmp = matrix(assortment[assortmentmap], ncol=Kl, nrow=Nl, byrow=TRUE) maxnKl = max(apply(tmp,1,function(x) sum(!is.na(x)))) aindl = matrix(0, ncol=maxnKl, nrow=Nl) nKl = numeric(Nl) if (sum(tmp==0)>0) {

contruct list of available alternatives for each observation

for (n in 1:Nl) {
  tmp2 = which(tmp[n,]==1)
  aindl[n, 1:length(tmp2)] = tmp2
  nKl[n] = length(tmp2)
}

} else { aindl = matrix(rep(1:Kl,Nl), ncol = Kl, nrow=Nl, byrow=TRUE) nKl = rep(K,Nl) }

marginsl = array(NA,c(Nl,Kl)) for (n in 1:Nl){ indices = ((n-1)Kl+1):(nKl) marginsl[n,] = marginsVl[indices] }

######################################

COMPUTE PROFITS

###################################### profit = numeric(usedraws) for (draw in 1:usedraws){ thetal = matrix(as.numeric(thetaVl[draw,]), nrow = Ml, ncol=Dl, byrow=FALSE) taul= matrix(as.numeric(tauVl[draw,]), nrow=Ml, ncol=Kl, byrow=FALSE) profit[draw] = draw_profit(Nl, Ml, Kl, nKl, cindl, aindl, taul, thetal, Xl, marginsl ) }

tmp = sum(profit, na.rm=TRUE) if (length(tmp)==0 | is.na(tmp) | is.nan(tmp) | is.infinite(tmp)) return(0) else return(tmp)
}

So I am avoiding invalid values for the fitness function.

If you could offer any insight on what may be happening, I would greatly appreciate it.

Thanks in advance

Rafael

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

--

Luca Scrucca, PhD Associate Professor of Statistics Department of Economics University of Perugia Via A. Pascoli, 20 06123 Perugia (Italy) Tel +39-075-5855229 Fax +39-075-5855950 E-mail luca.scrucca@unipg.it Web page http://www.stat.unipg.it/luca

rbecerril commented 6 years ago

Luca,

Thanks for the prompt reply and for the recommendations. I was hoping you may have seen something similar before, but the suggestions are very useful anyway. Just to follow up on your questions:

-First I would look at the following: - "recently I started experiencing problems with it." Do you mean that previously the same exact code + data was working? Can you prove that?

Great suggestion. Will test with the old dataset again.

  • then you wrote "I changed my data and […]". What if you change again the data? And again?

I am using two different datasets, and the algorithm breaks down for both. Maybe these datasets are particularly different from previous datasets.

  • try to call the fitness function with random solutions and see if the function returns what you expect.

will try this after other tests

  • run the code not in parallel to see if the problem is in the parallelization.

Good idea. I am already working on that.

Thanks again. I'll post back once I figure out what the issue is.

Rafael

rbecerril commented 6 years ago

Luca,

I debugged the code serially on a Windows machine and this way was able to get more detailed debugging information. For some reason the error information on the linux server was not very informative. It turned out to be a problem with the fitness function, nothing wrong with GA.

Thanks again for the recommendations.

Rafael