microsoft / AirSim

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research
https://microsoft.github.io/AirSim/
Other
16.45k stars 4.58k forks source link

Unreal window crashes #631

Closed Kjell-K closed 6 years ago

Kjell-K commented 6 years ago

@sytelus @all Since the new AirSim updates, my test environment crashes after a while, when I perform my reinforcement learning. Here my project

I get this error debug message:

Unhandled exception at 0x00007FF62B37465E in Blocks.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF. If there is a handler for this exception, the program may be safely continued.

If I continue debug, I am in a endless loop of this error.

The debugging points me to list with thread [9020] ucrtbase.dll of the process [10512] Blocks.exe at line 1288:

        void pop_front()
        {   // erase element at beginning
        erase(begin());
        }

Any suggestions? The new updates of reset etc. are amazing but this is really an issue for progressing in RL.

Note that the debug message only occurs with the update of early November, while the current version still crashes the same way but does not throw an debug window (Unreal real just closes).

Assumption: Error is produced by client.reset(). This error always occurs directly when doing the reset call or directly afterwards when trying to move after reset. Also the client.reset is the major update and change towards the early October version.

cangokalp commented 6 years ago

Yeah, I'm having a similar problem. Can't perform any training as simulation often crashes randomly. Sometimes it would crash after a day sometimes after 5 hours... it seems to be random. When it crashes I usually get an access violation error too but the error differs every time. Not sure what causes it but one thing that I notice; debugger opens up locked xthread and list files when there is a crash.

cangokalp commented 6 years ago

Here are some examples; capture capture2 capture3 capture4

Kjell-K commented 6 years ago

@cangokalp It looks similar to my issues. I get pointed to list.

For now I am using the AirSim Version from early October (where no reset is implemented) as a work around. . . Not ideal.

sytelus commented 6 years ago

I'm looking in to this issue. First I tried to reproduce this by writing this simple script. Even after running few 100s iterations, I haven't seen the crash yet. Could you please help suggest modification to this script that would generate the crash more often?

Also, if you can get screenshot where the "Call Stack" window is visible, that would be much more useful.

Kjell-K commented 6 years ago

@sytelus First I suggest to let the drone collide and then as result reset. Second a few 100 iterations will not be enough since the crash occurs random. As I wrote, I had the crash between 2.000 and even 120.000 steps. Therefore it is somehow random with a very small probability to occur. But for RL almost certain to occur with large iteration number.

Here a test script. It is strapped down from my complete project. Place a few feet in front of the copter home position and let him collide and then reset.

Note: Again with the most recent Version (with the dialog "Do you want to use car?"), the window just closes and there is no debug pop up.

sytelus commented 6 years ago

Thanks for the script. I'll try this out. Meanwhile if you are running from the source code (in Visual Studio) then you might be able to see call stack.

In any case, would it be possible to share the "Saved" folder in your project. That often contains crash logs and may help to debug this. You can probably zip up that folder and share on dropbox or something...

cangokalp commented 6 years ago

@sytelus, maybe this can help, I saved it in a notepad last time

call_stack.txt

I'll attach another one when it crashes again.

Kjell-K commented 6 years ago

@sytelus When I run from source code, I have other instability issues that cause a crash before our issue even occurs.

Here are my "Saved" folders , which contains logs and "Crashes". Once the current version of AirSim and once the version from end of October / end of November (with the fixed client.reset).

sytelus commented 6 years ago

@Kjell-K Thanks for sending call stack before. I have guess why this might be happening and have checked-in the potential fix. Please try it out and let me know how it goes.

I looked at the Saved folder as well and it looks like errors before and current are same. Its possible that its the same one that above fix applies to.

Kjell-K commented 6 years ago

@sytelus I run my RL task now for 210k steps without a crash on the potential fix. So I think you got to the root of the issue. Well done!

I will report after 500k again. .

UPDATE: No more crashes even after 350k. For me this issue is closed. Thanks a lot and keep up the good work.

cangokalp commented 6 years ago

For me the issue changed and now I get this after some time randomly . Similar to the previous issue but now instead of crashing this happens; self.client.call('reset') File "...\Anaconda3\lib\site-packages\msgpackrpc\session.py", line 41, in call return self.send_request(method, args).get() File "..\Anaconda3\lib\site-packages\msgpackrpc\future.py", line 43, in get raise self._error msgpackrpc.error.TimeoutError: Request timed out

Kjell-K commented 6 years ago

@cangokalp @sytelus +1 unfortunately. Unreal Window is still open with the quadcopter in the terminal state (collision) for reset. This happened after 325k steps and 9k successful resets. So therefore very randomly as well.

For now, I will build the reset into a while to call another reset if home position is not reached. Possible work around.

M-Kasem commented 6 years ago

Is there a solution to this problem?, I am using multivehicles with multi threaded client connections and I am getting random crashes sometimes after ~20k or ~40k steps, even the logger is not consistent, sometimes the logging is: [2018.03.11-19.20.22:563][841]LogAirSim: Error: Exception occurred while updating world: reset() must be called first before update() Some crashed logs this: UE4Editor: ../nptl/pthread_mutex_lock.c:117: pthread_mutex_lock: Assertion `mutex->data.__owner == 0' failed. Signal 6 caught. Malloc Size=131076 LargeMemoryPoolOffset=131092 CommonLinuxCrashHandler: Signal=6 Malloc Size=65535 LargeMemoryPoolOffset=196655

paulfauthmayer commented 6 years ago

For me, the issue still persists. AirSim crashes with msgpackrpc.error.TimeoutError: Request timed out upon calling responses = self.simGetImages([ImageRequest(0, AirSimImageType.Scene, False, False), ImageRequest(0, AirSimImageType.DepthPerspective, True, False)]). The amount between start and crash seems random.