ropensci / nlrx

nlrx NetLogo R
https://docs.ropensci.org/nlrx
GNU General Public License v3.0
77 stars 12 forks source link

Exceeded step memory limit #19

Closed dataandcrowd closed 5 years ago

dataandcrowd commented 5 years ago

I've been trying to run nlrx with my GIS extension model, but seems to fail all the time. I thought it was my local machine that had less memory, however it doesn't seemed to work on the HPC either.

R script

Sys.setenv(JAVA_HOME='/usr/local/software/spack/spack-0.11.2/opt/spack/linux-rhel7-x86_64/gcc-5.4.0/jdk-xxxxxx') #Ubuntu cluster

## Load packages
library(nlrx)
library(tidyverse) 
library(rcartocolor)
library(ggthemes) 

# HPC 
netlogopath <- file.path("/usr/local/Cluster-Apps/netlogo/6.0.4")
outpath <- file.path("/home/hs621/github/nlrx")

## Step1: Create a nl obejct:
nl <- nl(nlversion = "6.0.4",
         nlpath = netlogopath,
         modelpath = file.path("/home/hs621/github/jasss/Gangnam_v6.nlogo"),
         jvmmem = 1024)

## Step2: Add Experiment
nl@experiment <- experiment(expname = "seoul",
                            outpath = outpath,
                            repetition = 1,   
                            tickmetrics = "true",
                            idsetup = "setup",  
                            idgo = "go",        
                            runtime = 8763,
                            evalticks=seq(1,8763),
                            constants = list("PM10-parameters" = 100,
                                             "Scenario" = "\"BAU\"",
                                             "scenario-percent" = "\"inc-sce\""),
                            variables = list('AC' = list(values=c(100,150,200))),
                            metrics.turtles =  list("people" = c("pxcor", "pycor", "homename", "destinationName", "age", "health")),
                            metrics.patches = c("pxcor", "pycor", "pcolor"))

# Evaluate if variables and constants are valid:
eval_variables_constants(nl)

nl@simdesign <- simdesign_distinct(nl = nl, nseeds = 1)

# Step4: Run simulations:
init <- Sys.time()
results <- run_nl_all(nl = nl)
Sys.time() - init

# Attach results to nl object:
setsim(nl, "simoutput") <- results

seesion info:

Changed directory to /home/hs621/github/nlrx.

JobID: 11247244
======
Time: Fri May  3 02:34:12 BST 2019
Running on master node: cpu-e-488
Current directory: /home/hs621/github/nlrx

Nodes allocated:
================
cpu-e-488

numtasks=, numnodes=1, mpi_tasks_per_node=9 (OMP_NUM_THREADS=1)

Executing command:
==================
Rscript /home/hs621/github/nlrx/nlrx_seoul.R

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0       ✔ purrr   0.3.1  
✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
✔ tidyr   0.8.3       ✔ stringr 1.4.0  
✔ readr   1.3.1       ✔ forcats 0.3.0  
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
All defined variables and constants are valid!
Creating distinct simulation design

Time difference of 4.585698 hours
/var/spool/slurm/slurmd/job11247244/slurm_script: line 134: 42363 Killed Rscript /home/hs621/github/nlrx/nlrx_seoul.R
slurmstepd: error: Exceeded step memory limit at some point.

As you can see in the last line, I get errors of slurmstepd: error: Exceeded step memory limit at some point. I presume the error happens when I assign setsim(nl, "simoutput") <- results.

Is there a way to solve this issue, or do I have to split my evalticks? Many thanks

FYI All my netlogo files are stored on my github repository(https://github.com/mrsensible/airpollutionABM).

nldoc commented 5 years ago

Thank you very much for your bug report and the code example. I will look into this in more detail. But my first impression from a quick look at your model is that the amount of measured data might just be too huge. Your model has 96822 patches (326*297) and you measure patch metrics on 8763 ticks for three different runs. This will result in a tibble with 3 columns and 2 545 353 558 rows ;) And this is only the patches data, on top you also have thousands of turtle measurements on each tick.

Measuring patches and turtles metrics on each tick comes at a price unfortunately. That is, why we also put a warning in our manual, that measuring spatial data on each tick might blow up your results tibble really fast.

One thing you could do, is to think about your experiment design. Do you really need the spatial data on each tick? Maybe it is sufficient to measure each 100th tick? You can easily do this by setting evalticks to seq(1, 8763, 100). Another thing I noticed, when I ran the model directly in NetLogo, was that patch colors did not seem to change anyway during the model run. Is that true? If yes, I suggest you to run one experiment with only one tick to get the spatial patch information. Then run your complete experiment without measuring patch data. This should save you a lot of the current resource requirements. Nevertheless, I will also continue trying to find a solution to get your snippet running.

dataandcrowd commented 5 years ago

@nldoc Thanks for the prompt reply. I didn't know that my results can build up to 2.5 trillion rows! Thanks to your advice, I ran my model successfully with seq(1, 8763, by = 10). If I used seq(1, 8763, by = 100), I might miss the abruptive degradation of agents health.

I added patch info just to compare the difference between modelled PM10 and observed PM10.

Thanks!