vojtamolda / autodrome

Framework and OpenAI Gym environment for autonomous vehicle development.
http://tiny.cc/autodrome
MIT License
224 stars 20 forks source link

OpenAI Gym Environment Slows Down After ~10 Episodes #2

Open vojtamolda opened 5 years ago

vojtamolda commented 5 years ago

It seems that the September 2018 1.32.x update of ATS/ETS2 broke the long-term stability of the OpenAI Gym environment runs.

Over the period of about 10 episodes the game slows down to the point where one of the ugly time.sleep(...) calls in the code of the Simulator class breaks down. Waits are normally meant to let the game redraw it's UI before pressing ~ on the virtual keyboard and issuing console commands.

The frequently repeated cycle of opening and closing the map is very unique use case and most likely isn't covered by any SCS in-house unit test. The usual gaming workload is only a single map load of the usa or europe followed by a long stretch of driving. This also means that the bug is very unlikely to get fixed by a subsequent patch.

Here are the update 1.32.3 release notes links for ATS and ETS2. None provides a clue why the slow down might be happening.

gordicaleksa commented 3 years ago

Did you solve this one?

With gym==0.17.3 I'm getting periodic slowdowns manifested in drops in my FPS (frames/env steps per second) metric.

Here is an example: image

vojtamolda commented 3 years ago

Hello @gordicaleksa,

Thanks for your question. To be honest I didn't have time (and won't have anytime soon) to work on this but here's a few of (probably bad) ideas on how to solve this. They may come helpful in case you would like to try to figure it out yourself ;)

A) Run the game without the telemetry plugin. It's the code I wrote so it's the most likely place to be hiding some kind of a nasty bug. It uses ZMQ sockets to transmit the data to the Python client. Since it runs as a dynamic library loaded into the address space of the application, chances are some buffer somewhere is overflowing and slowing everything down.

B) Run the game in some profiler and try to compare sample of an early episode and a later one. Since it slows down so significantly, functions that are responsible for it should be fairly obvious. Profiling will most likely only yield binary stack traces since there's no debug information. Function addresses shouldn't change during a single run so a comparison should still be possible.

Let me know what do you think about this and whether it's making at lest some sense.

Cheers,

Vojta