ARGoS slow to exit after experiment finished

jharwell commented 5 years ago

So I've noticed with large swarms (> 10,000 robots) that after the experiment finishes and all threads exit, my CPU usage drops to near 0 and stays there for quite a while during which time ARGoS is tearing down the simulation, releasing memory, etc. Profiling and looking at the resulting hotspots, the culprit is CSpace::Destroy() which calls CSpace::RemoveEntity() on each entity. That functions performs a linear search of two maps and two vectors each of over 10,000 members each to remove the pointers connecting the object to ARGoS before actually deleting the object (it then repeats this process for 9,999 remaining objects, etc), which is verrryyyyyy slow to finish even on a fast machine. This slowdown will scale exponentially with the # of robots, which is not an issue for smaller swarms, but with the larger ones I've been starting to test with, it is observable to the tune of 30-40 seconds with 10,000 robots, and with 50,000 it is up to 5-6 minutes.

After the experiment has finished, it seems like you could get away with just deleting the actual object (the maps and vectors will be automatically deleted when the CSpace object is destroyed), and you can skip all the searching and other checks. Granted that this optimization is only possible at the end of the simulation after all the computationally intensive computation during experiments has finished, but given its exponential scaling, I think it could be worthwhile.

@ilpincy @allsey87 Thoughts?

jharwell commented 5 years ago

@ilpincy,@allsey87 gentlest of bumps on this. Do you think this idea has merit? Thanks!

allsey87 commented 5 years ago

What you are talking about here is really at the core of ARGoS and changes have the potential to break things badly. If you think that significant improvements can be made, you can try implement them alongside some automated tests (which ARGoS badly needs in my opinion). After that and assuming you can prove that everything still works well, you could open a pull request that can be looked at @ilpincy once he has time.

jharwell commented 5 years ago

@allsey87 im not sure I follow on how it could break things? I'm only talking about removing the checks after the simulation has finished and all that remains is tear down / deleting memory, and my thought is that as long as things don't crash and all memory is still freed making this optimization would be OK. During the simulation I'm pretty sure the checks are necessary, and the code should remain as is for the part that does them.

ilpincy / argos3

ARGoS slow to exit after experiment finished #107