Performance of Agents.jl seems wrong

Tortar commented 1 year ago

I was reading the paper related to this benchmark repository and noticed that the performance results about Agents.jl seem wrong, in particular in this figure

I saw that sometimes the Mesa framework beats the Agents.jl one, for example in the Forest Fire model. I have experience with Mesa and I know for sure that it can't happen in a fair comparison since Python is slow compared to Julia, you can also look here: https://juliadynamics.github.io/Agents.jl/stable/comparison/ where Agents.jl and other frameworks are compared and Agents.jl is 120 times faster than Mesa in running Forest Fire. I think that following the benchmark methodology which allowed the Agents.jl devs to compare the two frameworks can produce fairer benchmarks, you can find it at https://github.com/JuliaDynamics/ABM_Framework_Comparisons. Looking at the code for the benchmark in this repo and considering that the time results for Agents.jl always start approximately from 10 seconds in the figure, I think that the problem is that you considered the so called "time to first plot" as part of the timing, I think that removing that time could make Agents.jl one of the fastest if not the fastest framework.

Tortar commented 1 year ago

For example, compare the output of the @time macro vs @btime macro (which can be a better way to measure the time) for forest_fire:

benchmark code

```julia using Agents, Random, BenchmarkTools @agent Automata GridAgent{2} begin end function forest_fire(; density = 0.6, griddims = (400, 400)) space = GridSpace(griddims; periodic = false, metric = :euclidean) # The `trees` field is coded such that # Empty = 0, Green = 1, Burning = 2, Burnt = 3 forest = ABM(Automata, space; properties = (trees = zeros(Int, griddims),)) for I in CartesianIndices(forest.trees) if rand(forest.rng) < density # Set the trees at the left edge on fire forest.trees[I] = I[1] == 1 ? 2 : 1 end end return forest end function tree_step!(forest) # Find trees that are burning (coded as 2) for I in findall(isequal(2), forest.trees) for idx in nearby_positions(I.I, forest) # If a neighbor is Green (1), set it on fire (2) if forest.trees[idx...] == 1 forest.trees[idx...] = 2 end end # Finally, any burning tree is burnt out (2) forest.trees[I] = 3 end end forest = forest_fire() @time Agents.step!(forest, dummystep, tree_step!, 200) forest = forest_fire() @btime Agents.step!(forest, dummystep, tree_step!, 200) samples=1 evals=1 ```

Running julia forestfire.jl gives

  0.515094 seconds (2.22 M allocations: 119.700 MiB, 5.10% gc time, 98.11% compilation time)
  10.085 ms (1001 allocations: 4.94 MiB)

~~So the programme is actually 50x times faster~~ Actually you also considered the importing of Agents.jl library so it is actually 500 times faster (see below)

Datseris commented 1 year ago

Timing things with @time the first time includes compile time. It does not make sense to compare performance and include compile time in Julia. If the authors decided to compare performance and include the compile time for Julia, then they should also include the download and install time for Python, as this is when compilation happens for python......................

Datseris commented 1 year ago

  0.515094 seconds (2.22 M allocations: 119.700 MiB, 5.10% gc time, 98.11% compilation time)

The output literally says that 98.11% of the time was compilation.

Tortar commented 1 year ago

I also calculated that Schelling in Agents.jl is actually 350 times faster than what's claimed for a 100x100 grid with 1000 agents, real time: 20 ms vs 7s, since you not only considered the compilation time of the code but also the importing of the Agents library when running bash test_agents.sh output which shouldn't be part of the benchmark. The code used to reproduce the correct time is

benchmark code

```julia using Agents, BenchmarkTools, Random @agent SchellingAgent GridAgent{2} begin mood::Bool # whether the agent is happy in its position. (true = happy) group::Int # The group of the agent, determines mood as it interacts with neighbors end function initialize(; numagents = 1000, griddims = (100, 100), min_to_be_happy = 3, seed = 125) space = GridSpaceSingle(griddims, periodic = false) properties = Dict(:min_to_be_happy => min_to_be_happy) rng = Random.MersenneTwister(seed) model = ABM( SchellingAgent, space; properties, rng, scheduler = Schedulers.Randomly() ) ## populate the model with agents, adding equal amount of the two types of agents ## at random positions in the model for n in 1:numagents agent = SchellingAgent(n, (1, 1), false, n < numagents / 2 ? 1 : 2) add_agent_single!(agent, model) end return model end function agent_step!(agent, model) minhappy = model.min_to_be_happy count_neighbors_same_group = 0 ## For each neighbor, get group and compare to current agent's group ## and increment `count_neighbors_same_group` as appropriately. ## Here `nearby_agents` (with default arguments) will provide an iterator ## over the nearby agents one grid point away, which are at most 8. for neighbor in nearby_agents(agent, model) if agent.group == neighbor.group count_neighbors_same_group += 1 end end ## After counting the neighbors, decide whether or not to move the agent. ## If count_neighbors_same_group is at least the min_to_be_happy, set the ## mood to true. Otherwise, move the agent to a random position, and set ## mood to false. if count_neighbors_same_group ≥ minhappy agent.mood = true else agent.mood = false move_agent_single!(agent, model) end return end @btime (model = initialize(); step!(model, agent_step!, 200)) samples=1 evals=1 ``` <\details>

Tortar commented 1 year ago

Also, can't understand how it is possible that you measured ~1000 seconds to run the model with 128000 and a grid of 1131x1131 since on my laptop which is of comparable performance of the computer used for the benchmark it takes just 7 seconds. Even considering the slowdown due to the consideration of the factors I explained (which shouldn't be included), this wouldn't be possible anyway

edit: my mistake here, I calculated it on Schelling which is ~50 seconds in your benchmark, so it is not that far considering that ~10 seconds are from compilation (which should be removed) to sum to the 7 seconds I calculated (but actually there is still a a somewhat difference: your cpu is even a bit better than mine, removing compilation time from your bench it is still 40s vs 7s)

Datseris commented 1 year ago

This is very weird. What is even more weird is that the authors cite our paper

Datseris, G.; Vahdati, A.R.; DuBois, T.C. Agents.jl: A performant and feature-full agent-based modeling software of minimal code complexity. Simulation 2022, 003754972110688. [Google Scholar] [CrossRef]

In that paper we do a performance comparison of Agents.jl with other frameworks and we find Agents.jl to be faster. We took massive care to make all performance comparisons valid and fair. What did the authors here think? Didn't they consider that "something must be wrong" given that their benchmarks are so far off from what is reported in our paper?

@giusdam @ddevin96 ?

spagnuolocarmine commented 1 year ago

Dear @Tortar @Datseris ,

Thank you for your awesome contributions. Anyway, I cannot fail to note that in many of your comments, suggestions, or questions, you are violating any kind of community guidelines and code of conduct.

For instance,

Timing things with @time the first time includes compile time. It does not make sense to compare performance and include compile time in Julia. If the authors decided to compare performance and include the compile time for Julia, then they should also include the download and install time for Python, as this is when compilation happens for python......................

In my opinion, here, it is no place for controversy, but for contributions or discussions (at most), many of your comments are really playful social network style.

In any case, we reported exactly how to reproduce the benchmark in the paper (using the Unix command time -> time julia x.jl). Yes, we know that includes the compilation time.

But we will look into your comments, and we will replay.

Thank you

Tortar commented 1 year ago

yes, I have to say that I probably fired up too much, sorry for that. Thanks for the response, I even found one possible problem with the comparison for the flockers model, there is some sort of memory bug in Agents.jl which makes all the 16 gb of memory to get allocated in large models, so this can possibly invalidate the largest benchmarks, but this is clearly something problematic in Agents.jl and not in the benchmarking process used here, so the comparison actually helped identify the problem, thanks for that

Datseris commented 1 year ago

If there is a memory bug in Agents.jl this is an Agents.jl problem, not an incorrect benchmark.

Tortar commented 1 year ago

I mentioned that as something to which the discussion helped to shed some light, clearly it's something agents.jl has to deal with, not the other way around (just to clarify)

Tortar commented 1 year ago

@spagnuolocarmine any news on this matter?

Tortar commented 1 year ago

Restating it since for some reason it seemed it gone unnoticed @spagnuolocarmine @alessant @giusdam @ddevin96

giusdam commented 1 year ago

Hi @Tortar, sorry for the late reply, but this is a tough and very busy period for us. As we stated in the paper

The reasons for these results must be further investigated and may also be influenced by our inexperience with the platform and the need for more documentation and examples.

Therefore, we are open to discussing a more rigorous way to make the benchmark, as @Datseris mentioned; so, if you could kindly give us what models and what exact configurations you want us to test, and of course, which procedure you think is the most appropriate to use, we can update this repository.

Datseris commented 1 year ago

Thanks for the reply @giusdam . As I mentioned in my email, the only way to get an 100% objective and transparent comparison is to run computations on continuous integration. At least, this is what we believe.

Therefore, we are currently working on this repository https://github.com/JuliaDynamics/ABM_Framework_Comparisons which was the original repo where the first comparisons were done when Agents.jl was publised. We will try to make it so that every model will be run on CI.

The way it will most likely make it work is as follows: the repo will have a folder for each model. Each folder will have a markdown file that declares model dynamics as accurately as possible. The folder will also have a model implementation in each ABM software that can implement the model. Continuous integration then run through each folders, and runs all files that can be run. It collects output, in terms of memory allocation and time to run into a hierarchical dataframe organized by model name, and then by ABM software name. This CI output is collected by the Agents.jl documentation creation and is made into a plot. So, this means that the numbers of this web page: https://juliadynamics.github.io/Agents.jl/stable/comparison/ will be replaced by completely computer generated numbers that are not touched by humans.

The other reason that we will continue with the aforementioned repository instead of this one here is because we already know that all development teams agree with the way we benchmark things there. Mesa and MASON teams have seen the benchmarks and there is no disagreement how we do it. The MASON team has disagreements on how we setup the MASON models, but this can easily be solved via a simple Pull Request once the Continuous Integration is in order.

If you want to participate in this endeavor then feel free to make PRs that allow running and timing Java models on CI and/or improving the code of the MASON models.

giusdam commented 1 year ago

Yes, I agree with your proposal @Datseris. Our research group is currently busy with other projects, but we will join you in your work on the repository you mentioned as soon as possible. We will share any updates via email.

Datseris commented 1 year ago

Thanks for the relpy @giusdam ! If possible, we would prefer to get updates on GitHub by e.g., opening an Issue on the repo https://github.com/JuliaDynamics/ABM_Framework_Comparisons so that all Agents.jl core development team and frequent contributors can see them.

isislab-unisa / ABM_Comparison

Performance of Agents.jl seems wrong #1