nimble-dev / nimble

The base NIMBLE package for R
http://R-nimble.org
BSD 3-Clause "New" or "Revised" License
160 stars 24 forks source link

`as.matrix` seg fault #1460

Open paciorek opened 5 months ago

paciorek commented 5 months ago

A user reported a seg fault at the end of MCMC:

""" I am having an "caught segfault" error on the execution of the MCMC. After a few hours of the execution of the first chain the following error pops up.

caught segfault address 0x7fcfa3bc8240, cause 'memory not mapped'

Traceback: 1: as.matrix.CmodelValues(mcmc$mvSamples) 2: as.matrix(mcmc$mvSamples) 3: runMCMC(cMy.MCMC, niter = 120000, nburnin = 80000, nchains = 3, summary = TRUE) An irrecoverable exception occurred. R is aborting now ... /appl/opt/csc-cli-utils/bin/singularity_wrapper: line 42: 1711121 Segmentation fault apptainer --silent exec $SING_FLAGS $SING_IMAGE "${@:2}" """

I have reproduced this with the user's code (side note: it doesn't seem to be related to using Singularity). However it doesn't occur with shorter runs, and the full run (120k iterations, 80k burnin) takes something like 1.5 days. So I am still trying to track it down. When browsing in as.matrix.CmodelValues, it occurs the first time that fastMatrixInsert is inserting a single column, though I don't know if that is related to the problem.

This is of course heavily-used code, so quite curious. At the moment I'm trying to see if there is anything odd about the actual values being inserted in that single column.

paciorek commented 5 months ago

A couple other tidbits:

  1. User subsequently reported that if they increase the memory available, the problem went away. Odd as that is not the behavior of a seg fault though I have a vague feeling I have encountered such behavior before, and perhaps with more memory there is less chance that writing to unallocated memory causes a problem.
  2. However, when I ran on a machine with a ton of memory I could still reproduce the problem.
  3. I stopped the R code just before the seg-faulting call to fastMatrixInsert. All the dimensions looked fine as did the values in the matrices.
  4. I inserted some print statements into fastMatrixInsert. Sure enough, I didn't get the seg fault.

So I suspect that if I invoke the C++ debugger I won't see the error, but that is the next thing to try.