bayesian analysis getting stuck

mwpennell commented 10 years ago

so i ran all of the analyses -- see analysis/R/model-adequacy-angio-analysis.R (but with number of cores increased) -- over the winter break on a stand alone machine. Tested this out on a couple of datasets and everything seems to be hunky dory. Moderately sized datasets ran in a few minutes and longer ones took a bit longer.

However, I have a problem. It appears that 5 datasets have stalled out (still going at +500 hrs). And because the write.csv command follows a do.call(rbind) after the lapply, I can't get the results from the other datasets. I am thinking I will just have to kill the processes and start over again. This is not a big deal as most of the datasets ran fine and were done in a day or so.

My questions to you are:

Have you had any experience with this type of thing in diversitree (some runs taking an unusually long time)? Know what is going on here?
What is the best way to structure the script so that I can use multiple processers but still save the results as I go along? It would have been nice to think of this beforehand so I could have just killed the processes and had my results for the other 342 datasets...

thanks -- m

richfitz commented 10 years ago

Hmm, no. But then I've not run many MCMC chains of the OU type models.

There are a few options for keeping track of things as you go. The diversitree::mcmc function has options to save out files as you run. I use that for really long running stuff that I might want to restart. However, I suspect that is overkill for this.

I guess these (from here) is where the problem comes from:

td.sla.fam <- treedata.taxon(phy=sla$phy, data=sla$states,
                             rank="family", min.size = 20)
sla.fam <- mclapply(td.sla.fam, function(x) modelad.bayes.clade(x, se=sla$SE, rank="family", trait="sla"),
                    mc.preschedule = FALSE, mc.cores = 3)

My usual solution to this is something like this (unchecked)

## Simple wrapper to run an expression and save the output in
## filename.  If it is is called again it will just reload the data,
## unless regenerate=TRUE.
run.cached <- function(expr, filename, regenerate=FALSE) {
  if (file.exists(filename) && !regenerate) {
    res <- readRDS(filename)
  } else {
    res <- eval.parent(substitute(expr))
    saveRDS(res, filename)
  }
  res
}

## Generate filenames consistently.  This sort of thing is needed if
## you plan on both loading and saving data, but it helps the logic
## anyway.
file.output <- function(family, analysis)
  sprintf("output/%s-%s.rds", analysis, family)

With these pieces, write a little wrapper function:

run.analysis <- function(family, analysis, td.list)
  run.cached(modelad.bayes.clade(td.list[[family]], se=sla$SE,
                                 rank="family", trait="sla"),
             file.output(family, analysis))

dir.create("output", FALSE)

## To see how this works.  This will take a while:
res <- run.analysis("Juncaceae", "sla", td.sla.fam)
## but this time it's really fast:
res <- run.analysis("Juncaceae", "sla", td.sla.fam)

## So then you can do:
sla.fam <- mclapply(names(td.sla.fam), run.analysis, "sla", td.sla.fam,
                    mc.preschedule=FALSE, mc.cores=3)

Incidentally, I prefer to set the cores directly (in my .Rprofile)

options(mc.cores=3L, # for package:parallel
        cores=3L)    # for package:multicore -- deprecated now?

## And then drop the mc.cores option, which makes the code more
## portable across machines with different core numbers
sla.fam <- mclapply(names(td.sla.fam), run.analysis, "sla", td.sla.fam,
                    mc.preschedule=FALSE)

## Then proceed as before:
sla.fam <- do.call(rbind, sla.fam)
rownames(sla.fam) <- names(td.sla.fam)
sla.fam <- as.data.frame(sla.fam)

richfitz commented 10 years ago

Also, it's possible if you kill the child process (use top to work out which are still running and use kill to kill them) that the mclapply will continue with errors in the place of the aborted runs. Alternatively it will just keep going. I can't remember!

mwpennell commented 10 years ago

thanks. i will give the code a solid restructuring over the next couple of days and then give it another run through. inspired by how awesome the woodiness stuff was, i may as well make it markdown friendly at this point :)

richfitz commented 10 years ago

I've run 263 of the 377 analyses with no hint of stuckness. But I'll have to turn this computer off now to go away. Can you rerun everything fresh and see what happens?

I'm using github diversitree, fwiw.

richfitz commented 10 years ago

Hey, @mwpennell: can I assume that this is no longer a problem?

mwpennell commented 10 years ago

yep this was just a weird problem on one machine...

traitecoevo / modeladequacy

bayesian analysis getting stuck #65