Closed eyip002 closed 1 year ago
Note: Fixing one deadlock may resolve other deadlocks.
Writes to the variables in_use
in dyn_containers.forec
might not be committed properly or transferred to the outputs.
Server response: Released grabbed train External/internal state:
train_engine_instances_io[0]
input_grab: 0, 0
input_release: 0, 0
input_train_engine_type: 0, 0
input_requested_speed: 0, 0
input_requested_forwards: 1, 1
output_in_use: 0, 0
output_train_engine_type: -1, -1
output_nominal_speed: 0, 0
output_nominal_forwards: 1, 1
Server response: No response! Dynamic train engine container response:
External/internal state:
train_engine_instances_io[0]
input_grab: 1, 1
input_release: 0, 0
input_train_engine_type: 0, 0
input_requested_speed: 0, 0
input_requested_forwards: 1, 1
output_in_use: 0, 1 <-- Internal state not able to written to external state!
output_train_engine_type: -1, 0
output_nominal_speed: 0, 0
output_nominal_forwards: 1, 1
Server response: DCC train speed set to 13
Jun 29 16:36:50 raspberrypi3 swtbahn[18450]: server: Request received: Set train speed
Jun 29 16:36:50 raspberrypi3 swtbahn[18450]: server: engineInstance0: libtrain_engine_default (unremovable).tick() 13 1 -> 13 1
Jun 29 16:36:50 raspberrypi3 swtbahn[18450]: libbidib: Set train: cargo_db to speed: 13 via board: master (0x00 0x00 0x00 0x00) with action id: 637
Jun 29 16:36:50 raspberrypi3 swtbahn[18450]: server: Request: Set train speed - train: cargo_db speed: 13
External/internal state:
* dyn_containers_reaction_counter: 147021
* dyn_containers_actuate_reaction_counter: 146588
* dyn_containers_set_train_engine_instance_reaction_counter: 0
train_engine_instances_io[0]
input_grab: 0, 0
input_release: 0, 0
input_train_engine_type: 0, 0
input_requested_speed: 13, 13
input_requested_forwards: 1, 1
output_in_use: 1, 1
output_train_engine_type: 0, 0
output_nominal_speed: 13, 13
output_nominal_forwards: 1, 1
Server response: DCC train speed set to 0
Server did not log any messages!
External/internal state:
* dyn_containers_reaction_counter: 147083
* dyn_containers_actuate_reaction_counter: 146723
* dyn_containers_set_train_engine_instance_reaction_counter: 0
train_engine_instances_io[0]
input_grab: 0, 0
input_release: 0, 0
input_train_engine_type: 0, 0
input_requested_speed: 0, 13 <-- External state not able to written to internal state!
input_requested_forwards: 1, 1
output_in_use: 1, 1
output_train_engine_type: 0, 0
output_nominal_speed: 13, 13
output_nominal_forwards: 1, 1
Counter values some time later:
* dyn_containers_reaction_counter: 147083 <---ForeC program has stopped reacting...
* dyn_containers_actuate_reaction_counter: 211279
* dyn_containers_set_train_engine_instance_reaction_counter: 0
Seems like the problem is with the ForeC compiler generating erroneous code for getting the current wall time. The elapsed reaction can be a negative number, which can happen when clock_gettime()
fails and 0
is returned for the current time. This can lead to a very large delay value being passed into usleep()
. The clue was in this backtrace:
#2 0xb6a5f8e0 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
ts = {tv_sec = 4294, tv_nsec = 956530000}
#3 0xb6b0eb88 in forecMain (args=<optimized out>) at /home/pi/SWTbahn/swtbahn-cli/server/bin/dyn_containers.c:1154
Here, the delay passed into usleep
is over 4294 seconds (71 minutes). So, if we waited for 71 minutes, the dynamic containers would become responsive again.
This ForeC compiler defect has been fixed: https://github.com/PRETgroup/ForeC/commit/cb1867e70082e5cea3eb08ef075071f1bb6db546
Tested for over 3 hours (over 1 million reactions) and still works 👍
Debugging an unresponsive
swtbahn-server
:gdb ./swtbahn-server
and thenrun /dev/ttyUSB0 ../../configurations/swtbahn-standard/ localhost 8080
, orlldb ./swtbahn-server /dev/ttyUSB0 ../../configurations/swtbahn-standard/ localhost 8080
./swtbahn config localhost 8080 master
./swtbahn monitor get-debug-info
ctrl-c
bt
for backtraceinfo threads
to see all threadsthread ID
to switch to the backtrack of thread IDBacktraces generated when
swtbahn-server
becomes unresponsive:Probably fixed: