Comparing speeds of (1) GAMA GUI, (2) GAMA headless and (3) R/rama

choisy commented 5 years ago

I hear many of you complaining by the fact that R/rama is incredibly slow compared to GAMA GUI. Can somebody do some benchmarking here so that we have some numbers to compare: GAMA GUI, GAMA headless, and R rama. Is it possible to time the time it takes just for launching GAMA in the headless? I guess it should roughly be the time in headless of a simple model with few agents and just 1 time step right? Anyway, having numbers to compare here would be useful to see where the problem might be.

benoitgaudou commented 5 years ago

First insights on 1 simple simulation on the SIR.gaml model: results for rama and headless seem quite similar.

In R, I load an experiment:

gaml_file <- system.file("examples", "sir.gaml", package = "rama")
exp1 <- load_experiment("sir", gaml_file, "sir")

and evaluate only the time of the experiment run:

system.time(output <- run_experiment(exp1))

The time between rama and gama headless is quite similar:

in R: around 12.5 s
in headless : around 12s (with a precision of 1s)

choisy commented 5 years ago

Super nice. Probably very similar if we'd do repetitions.

benoitgaudou commented 5 years ago

With repetitions, results are much more different. I run the following experiment (see the attached xml file). sir9.xml.zip

gaml_file <- system.file("examples", "sir.gaml", package = "rama")

df <- expand.grid(S0 = c(900, 950, 999),
[sir9.xml.zip](https://github.com/r-and-gama/rama/files/2689590/sir9.xml.zip)

                  I0 = c(100, 50, 1),
                  R0 = 0,
                  beta = 1.5,
                  gamma = .15,
                  S = 1,
                  I = 1,
                  R = 1,
                  tmax = 1000,
                  seed = 1)
df
exp4 <- experiment(df, parameters = c(1:5),
                   obsrates = c(6:8), tmax = "tmax", seed = "seed",
                   experiment = "sir", model = gaml_file)
exp4

system.time(output <- run_experiment(exp4,8))

Results I get:

for GAMA : around 22s
for R : around 31s

Notice that when I run the experiment with only 1 core, with:

system.time(output <- run_experiment(exp4))

it takes around 60s.

meta00 commented 5 years ago

In run_experiment(exp), we do:

check if exp is an experiment
create output folder
generate xml parameter file from exp
run gama (this step uses GAMA headless)
retrieve results from GAMA results (one xml for each simulation is parsed to get the stats)
correct NAs

It may be not very surprising that run_experiment takes more time. We can try to improve this.

choisy commented 5 years ago

Yes, makes sense and it would be great if we could improve this.

jdzucker commented 5 years ago

Le 18 déc. 2018 à 12:29, Marc Choisy <notifications@github.com mailto:notifications@github.com> a écrit :

Yes, makes sense and it would be great if we could improve this.

for (i in 1:nrow(exp4)) {print(paste(system.time(output <- run_experiment(exp4[1:i,],8))))}

user system elapsed

Running experiment plan ...[1] "1.078" "0.0840000000000001" "11.4579999999999" "19.437" "1.211"
Running experiment plan ...[1] "2.096" "0.0899999999999999" "15.046" "29.973" "1.629"
Running experiment plan ...[1] "3.245" "0.1" "17.5900000000001" "40.556" "1.875"
Running experiment plan ...[1] "4.24" "0.123" "19.7769999999998" "52.0940000000001" "2.228"
Running experiment plan ...[1] "5.529" "0.138" "23.287" "70.469" "2.601" Running experiment plan ...[1] "6.505" "0.151" "30.473" "103.834" "3.271"
Running experiment plan ...[1] "8.289" "0.199" "30.8009999999999" "107.017" "3.689"
Running experiment plan ...[1] "8.79899999999999" "0.147" "34.106" "127.091" "3.824"
Running experiment plan ...[1] "9.857" "0.179" "39.9360000000001" "141.351" "3.506”

I am checking if it is possible to plot some lines. best Jean-Daniel

choisy commented 5 years ago

What do you mean by "plot some lines"? For information, there are a number of packages in R that allow good benchmarking and vizualization of results, see here for example. Also, there is the newly-released bench package. I haven't tried it yet, and I'm not even quite sure that's the tool we need here. To be explored... Here you are also running an experiment with an increasing number of simulations right? I guess that what you're aiming at here is seeing how the total simulation time scales with the number of simulations right? (and also estimating the rama overhead too right?) If yes, then I would recommend using exactly the same simulation each time. Indeed, since all the simulations of the exp4 object are different, it's impossible for now to see whether the observed time differences are due uniquely to the number of simulations or also to the nature of these simulations. See what I mean? Such an experiment with the same simulation repeated a large number of time could be generated with the repl() function, for example:

exp5 <- repl(exp4[1, ], 10)

Here you are also running simulation on 8 "CPU" in parallel. It would be interesting to kind of assess the overhead of the parallelization too. As a more general comment, I see that we are doing some bits of tests here and there. Maybe that would be a better approach to design a formal benchmarking test that we all agree on, specifying each time what are the things that we are interested in timing (rama overhead, parallelization overhead, scaling with number of experiments (linear vs non-linear), etc...). And, finally, such a benchmark should ideally be run on an "isolated" machine (i.e. not too many services running at the same time, minimum would be to cut wifi and bluetooth I guess). An Rmd vignette / website article on this issue of benchmarking would be really really nice. And absolutely key in the perspective of a publication. Also, a benchmarking comparing rama with RNetLogo and rrepast on the same model would be great too.

choisy commented 5 years ago

Would be interesting to compare the speeds of GAMA 1.7 and 1.8 too.

r-and-gama / gamar

Comparing speeds of (1) GAMA GUI, (2) GAMA headless and (3) R/rama #6