Closed Kjell-K closed 6 years ago
Yeah, I'm having a similar problem. Can't perform any training as simulation often crashes randomly. Sometimes it would crash after a day sometimes after 5 hours... it seems to be random. When it crashes I usually get an access violation error too but the error differs every time. Not sure what causes it but one thing that I notice; debugger opens up locked xthread and list files when there is a crash.
Here are some examples;
@cangokalp It looks similar to my issues. I get pointed to list.
For now I am using the AirSim Version from early October (where no reset is implemented) as a work around. . . Not ideal.
I'm looking in to this issue. First I tried to reproduce this by writing this simple script. Even after running few 100s iterations, I haven't seen the crash yet. Could you please help suggest modification to this script that would generate the crash more often?
Also, if you can get screenshot where the "Call Stack" window is visible, that would be much more useful.
@sytelus First I suggest to let the drone collide and then as result reset. Second a few 100 iterations will not be enough since the crash occurs random. As I wrote, I had the crash between 2.000 and even 120.000 steps. Therefore it is somehow random with a very small probability to occur. But for RL almost certain to occur with large iteration number.
Here a test script. It is strapped down from my complete project. Place a few feet in front of the copter home position and let him collide and then reset.
Note: Again with the most recent Version (with the dialog "Do you want to use car?"), the window just closes and there is no debug pop up.
Thanks for the script. I'll try this out. Meanwhile if you are running from the source code (in Visual Studio) then you might be able to see call stack.
In any case, would it be possible to share the "Saved" folder in your project. That often contains crash logs and may help to debug this. You can probably zip up that folder and share on dropbox or something...
@sytelus, maybe this can help, I saved it in a notepad last time
I'll attach another one when it crashes again.
@sytelus When I run from source code, I have other instability issues that cause a crash before our issue even occurs.
Here are my "Saved" folders , which contains logs and "Crashes". Once the current version of AirSim and once the version from end of October / end of November (with the fixed client.reset).
@Kjell-K Thanks for sending call stack before. I have guess why this might be happening and have checked-in the potential fix. Please try it out and let me know how it goes.
I looked at the Saved folder as well and it looks like errors before and current are same. Its possible that its the same one that above fix applies to.
@sytelus I run my RL task now for 210k steps without a crash on the potential fix. So I think you got to the root of the issue. Well done!
I will report after 500k again. .
UPDATE: No more crashes even after 350k. For me this issue is closed. Thanks a lot and keep up the good work.
For me the issue changed and now I get this after some time randomly . Similar to the previous issue but now instead of crashing this happens; self.client.call('reset') File "...\Anaconda3\lib\site-packages\msgpackrpc\session.py", line 41, in call return self.send_request(method, args).get() File "..\Anaconda3\lib\site-packages\msgpackrpc\future.py", line 43, in get raise self._error msgpackrpc.error.TimeoutError: Request timed out
@cangokalp @sytelus +1 unfortunately. Unreal Window is still open with the quadcopter in the terminal state (collision) for reset. This happened after 325k steps and 9k successful resets. So therefore very randomly as well.
For now, I will build the reset into a while to call another reset if home position is not reached. Possible work around.
Is there a solution to this problem?, I am using multivehicles with multi threaded client connections and I am getting random crashes sometimes after ~20k or ~40k steps, even the logger is not consistent, sometimes the logging is: [2018.03.11-19.20.22:563][841]LogAirSim: Error: Exception occurred while updating world: reset() must be called first before update() Some crashed logs this: UE4Editor: ../nptl/pthread_mutex_lock.c:117: pthread_mutex_lock: Assertion `mutex->data.__owner == 0' failed. Signal 6 caught. Malloc Size=131076 LargeMemoryPoolOffset=131092 CommonLinuxCrashHandler: Signal=6 Malloc Size=65535 LargeMemoryPoolOffset=196655
For me, the issue still persists. AirSim crashes with msgpackrpc.error.TimeoutError: Request timed out
upon calling responses = self.simGetImages([ImageRequest(0, AirSimImageType.Scene, False, False), ImageRequest(0, AirSimImageType.DepthPerspective, True, False)])
. The amount between start and crash seems random.
@sytelus @all Since the new AirSim updates, my test environment crashes after a while, when I perform my reinforcement learning. Here my project
I get this error debug message:
If I continue debug, I am in a endless loop of this error.
The debugging points me to
list
with thread[9020] ucrtbase.dll
of the process[10512] Blocks.exe
at line 1288:Any suggestions? The new updates of reset etc. are amazing but this is really an issue for progressing in RL.
Note that the debug message only occurs with the update of early November, while the current version still crashes the same way but does not throw an debug window (Unreal real just closes).
Assumption: Error is produced by client.reset(). This error always occurs directly when doing the reset call or directly afterwards when trying to move after reset. Also the client.reset is the major update and change towards the early October version.