Open mrindar opened 7 years ago
It's hard to say which changes since 0.7 could cause this. The first thing that comes to mind would be the timeout stuff, but that shouldn't be affected by changing configuration files. Actually, I can't think of anything that would cause some kind of memory effect wrt. the configuration files.
Some things you could do which could possibly narrow this down:
So it appears that changing the configuration files has nothing to do with it. I've been running a few simulations 10ish and 2 crashed (no changes to the configuration files). If I try to start a new simulation after the crash (not running --clean-cache on the slaveprovider), I get a
[warning] GetSlaveTypes request to slave provider 61926fa9-6de0-456c-afa5-69034f0c37c9 failed (timed out Error: Slave type not found: SomeFmu
error message, and it seems to always be on the same FMU, but that't not the one that necessarily crashes. The coralslave.exe which crashes isn't always for the same FMU.
EDIT I'm sorry to say, but this isn't consistent either :P
When I tried to press 'Debug' in the coralslave.exe crash window, Visual Studio gave me this message:
Unhandled exception at 0x743B170C (mswsock.dll) in coralslave.exe: 0xC0000005: Access violation executing location 0x743B170C.
Don't know if it means something
I'm curious if this is similar to the zeroMQ crash we've struggled with in the past, except then the crash took a proper choke hold on the CPU and a hard restart was required. However, that crash also seemed rather random, much like this one, and the mswsock.dll
sounds like some socket stuff if I put on my sherlock holmes hat
Oh! This is interesting. So,
Unhandled exception at 0x743B170C (mswsock.dll) in coralslave.exe: 0xC0000005: Access violation executing location 0x743B170C.
Simulation Number | Number of coralslaves.exe crashed | Pressing 'Continue' after crash worked |
---|---|---|
1 | 1 | yes |
2 | 1 | yes |
3 | 0 | N/A |
4 | 0 | N/A |
5 | 1 | yes |
6 | 1 | yes |
7 | 0 | N/A |
8 | 1 | yes |
9 | 0 | N/A |
10 | 0 | N/A |
11 | 0 | N/A |
12 | 3 | yes |
13 | 0 | N/A |
14 | 0 | N/A |
15 | 0 | N/A |
16 | 0 | N/A |
17 | 0 | N/A |
18 | 0 | N/A |
19 | 0 | N/A |
20 | 0 | N/A |
So, not very scientific here but, approximately 6 in 20 simulations crashes and one interesting one here is simulation 12 where 3 coralslaves crashed. So first one crashed, then i pressed 'Continue' in the debugger and then another one crashed, and so on. Also it seems like pressing 'Continue' in the debugger when a crash occurs, works every time
I've run your setup 50–60-ish times myself now, using coral 0.8.0-beta1, and it didn't crash once. So this is going to be hard to track down, I guess. You're still on Windows 7?
Awesome :P Still on Windows 7, yes
Don't have enough information for this issue yet, but after I started working with coral_0.8.0-beta1 coralslave randomly crashes at startup after having made changes to .execonf or .sysconf.
This is a typical scenario:
coralslaveprovider /path/to/fmus -o out/
coralmaster run .execonf .sysconf
stop
timecoralmaster run .execonf .sysconf
coralslaveprovider
coralslaveprovider --clean-cache
1
and2
It should be noted that running
coralslaveprovider --clean-cache
does not always work, sometimes I have to repeat the process a couple of times until the coralslaves stop crashing. Also, this doesn't always happen when a change is made to.execonf
or.sysconf
, but every time it has happened it has been after making a change to one of them.Some more investigation is definitely needed on this issue