lf-lang / lingua-franca

Intuitive concurrent programming in any language
https://www.lf-lang.org
Other
239 stars 63 forks source link

Non-deterministic error in decentralized federated program #2048

Closed erlingrj closed 11 months ago

erlingrj commented 1 year ago

DecentralizedP2PUnbalancedTimeoutPhysical.lf

target C {
  timeout: 1 msec,
  coordination: decentralized
}

reactor Clock(offset: time = 0, period: time = 1 sec) {
  output y: int
  timer t(offset, period)
  state count: int = 0

  reaction(t) -> y {=
    (self->count)++;
    //printf("Reacting at time %ld.\n", lf_time_logical_elapsed());
    lf_set(y, self->count);
  =}

  reaction(shutdown) {=
    lf_print("SUCCESS: the source exited successfully.");
  =}
}

reactor Destination {
  input x: int
  state s: int = 1

  reaction(x) {=
    // printf("%d\n", x->value);
    if (x->value != self->s) {
      lf_print_error_and_exit("Expected %d and got %d.", self->s, x->value);
    }
    self->s++;
  =}

  reaction(shutdown) {=
    lf_print("**** shutdown reaction invoked.");
    lf_print("Approx. time per reaction: %lldns", lf_time_physical_elapsed()/(self->s+1));
  =}
}

federated reactor(period: time = 10 usec) {
  c = new Clock(period=period)
  d = new Destination()
  c.y ~> d.x
}

Is failing sometimes:

Federate DecentralizedP2PUnbalancedTimeoutPhysical in Federation ID 'f316e441b2522e7bd76d635f7ed5f6f1b3eb34a63f7d542d'
RTI: Federation ID: f316e441b2522e7bd76d635f7ed5f6f1b3eb34a63f7d542d
RTI: Number of federates: 2
RTI: Clock sync: init
RTI: Clock sync exchanges per interval: 10
Starting RTI for 2 federates in federation ID f316e441b2522e7bd76d635f7ed5f6f1b3eb34a63f7d542d.
RTI using TCP port 15045 for federation f316e441b2522e7bd76d635f7ed5f6f1b3eb34a63f7d542d.
RTI: Listening for federates.
#### Launching the federate federate__c.
#### Launching the federate federate__d.
#### Bringing the RTI back to foreground so it can receive Control-C.
RTI -i ${FEDERATION_ID} -n 2 -c init exchanges-per-interval 10
Federation ID for executable /home/erling/dev/lf2/test/C/fed-gen/DecentralizedP2PUnbalancedTimeoutPhysical/bin/federate__c: f316e441b2522e7bd76d635f7ed5f6f1b3eb34a63f7d542d
---- System clock resolution: 1 nsec
---- Start execution at time Sat Oct  7 17:09:45 2023
---- plus 163174047 nanoseconds
Federation ID for executable /home/erling/dev/lf2/test/C/fed-gen/DecentralizedP2PUnbalancedTimeoutPhysical/bin/federate__d: f316e441b2522e7bd76d635f7ed5f6f1b3eb34a63f7d542d
---- System clock resolution: 1 nsec
---- Start execution at time Sat Oct  7 17:09:45 2023
---- plus 163531607 nanoseconds
Federate 0: Successfully connected to RTI.
Federate 1: Successfully connected to RTI.
Federate 0: Connected to RTI at localhost:15045.
Federate 1: Connected to RTI at localhost:15045.
RTI: All expected federates have connected. Starting execution.
RTI: Waiting for thread handling federate 0.
Federate 0: Connected to federate 1, port 15046.
Federate 0: Starting timestamp is: 1696691386265135210.
Federate 1: Starting timestamp is: 1696691386265135210.
Federate 0: Environment 0: ---- Spawning 2 workers.
Federate 1: Environment 0: ---- Spawning 1 workers.
Federate 1: FATAL ERROR: Expected 15 and got 14.
Federate 1 has resigned.
Federate 1: Connection to the RTI closed with an EOF.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: Received EOF from peer federate 0. Closing the socket.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: ---- Terminating environment 0
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: WARNING: ---- There are 12 unprocessed future events on the event queue.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: WARNING: ---- The first future event has timestamp 832950 after start time.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
---- Elapsed logical time (in nsec): 824,200
Federate 0: WARNING: Socket is no longer connected. Dropping message.
---- Elapsed physical time (in nsec): 2,183,174
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: WARNING: Memory allocated for messages has not been freed.
Federate 1: WARNING: Number of unfreed messages: 14.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: SUCCESS: the source exited successfully.
Federate 0 has resigned.
Federate 0: Connection to the RTI closed with an EOF.
RTI: Federate 0 thread exited.
RTI: Waiting for thread handling federate 1.
RTI: Federate 1 thread exited.
Federate 0: ---- Terminating environment 0
RTI is exiting.
RTI is exiting.
---- Elapsed logical time (in nsec): 1,000,000
---- Elapsed physical time (in nsec): 2,786,218
RTI has exited. Wait for federates to exit.
Killing federate 1586412.
Killing federate 1586413.
test/C/bin/DecentralizedP2PUnbalancedTimeoutPhysical: line 18: kill: (1586412) - No such process
test/C/bin/DecentralizedP2PUnbalancedTimeoutPhysical: line 18: kill: (1586413) - No such process
#### Killing RTI 1586406.
test/C/bin/DecentralizedP2PUnbalancedTimeoutPhysical: line 20: kill: (1586406) - No such process

A successful (?) output looks like this:

Federate DecentralizedP2PUnbalancedTimeoutPhysical in Federation ID 'e48542ab1bdcaaf80e354f36a66d35b5250489e2e0609259'
RTI: Federation ID: e48542ab1bdcaaf80e354f36a66d35b5250489e2e0609259
RTI: Number of federates: 2
RTI: Clock sync: init
RTI: Clock sync exchanges per interval: 10
Starting RTI for 2 federates in federation ID e48542ab1bdcaaf80e354f36a66d35b5250489e2e0609259.
RTI using TCP port 15045 for federation e48542ab1bdcaaf80e354f36a66d35b5250489e2e0609259.
RTI: Listening for federates.
#### Launching the federate federate__c.
#### Launching the federate federate__d.
#### Bringing the RTI back to foreground so it can receive Control-C.
RTI -i ${FEDERATION_ID} -n 2 -c init exchanges-per-interval 10
Federation ID for executable /home/erling/dev/lf2/test/C/fed-gen/DecentralizedP2PUnbalancedTimeoutPhysical/bin/federate__c: e48542ab1bdcaaf80e354f36a66d35b5250489e2e0609259
---- System clock resolution: 1 nsec
---- Start execution at time Sat Oct  7 17:10:46 2023
---- plus 131195790 nanoseconds
Federation ID for executable /home/erling/dev/lf2/test/C/fed-gen/DecentralizedP2PUnbalancedTimeoutPhysical/bin/federate__d: e48542ab1bdcaaf80e354f36a66d35b5250489e2e0609259
---- System clock resolution: 1 nsec
---- Start execution at time Sat Oct  7 17:10:46 2023
---- plus 131556928 nanoseconds
Federate 0: Successfully connected to RTI.
Federate 0: Connected to RTI at localhost:15045.
Federate 1: Successfully connected to RTI.
Federate 1: Connected to RTI at localhost:15045.
RTI: All expected federates have connected. Starting execution.
RTI: Waiting for thread handling federate 0.
Federate 0: Connected to federate 1, port 15046.
Federate 0: Starting timestamp is: 1696691447233091950.
Federate 1: Starting timestamp is: 1696691447233091950.
Federate 0: Environment 0: ---- Spawning 2 workers.
Federate 1: Environment 0: ---- Spawning 1 workers.
Federate 1: **** shutdown reaction invoked.
Federate 1: Approx. time per reaction: 130311ns
Federate 1 has resigned.
Federate 1: Connection to the RTI closed with an EOF.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: Received EOF from peer federate 0. Closing the socket.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: ---- Terminating environment 0
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
---- Elapsed logical time (in nsec): 1,000,000
Federate 0: WARNING: Socket is no longer connected. Dropping message.
---- Elapsed physical time (in nsec): 1,509,192
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: WARNING: Memory allocated for messages has not been freed.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 1: WARNING: Number of unfreed messages: 1.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: WARNING: Socket is no longer connected. Dropping message.
Federate 0: SUCCESS: the source exited successfully.
Federate 0 has resigned.
Federate 0: Connection to the RTI closed with an EOF.
RTI: Federate 0 thread exited.
RTI: Waiting for thread handling federate 1.
RTI: Federate 1 thread exited.
RTI is exiting.
Federate 0: ---- Terminating environment 0
RTI is exiting.
---- Elapsed logical time (in nsec): 1,000,000
---- Elapsed physical time (in nsec): 2,434,748
RTI has exited. Wait for federates to exit.
All done.
edwardalee commented 1 year ago

The error exhibited here:

Federate 1: FATAL ERROR: Expected 15 and got 14.

means that the Destination reaction was invoked a second time with the same input value from a previous invocation. The test case (incorrectly, I believe) has no STP offsets (neither STA nor STAA), despite using decentralized coordination. Somehow, it advances time (possibly to the stop time), believes the input to be present, but either a new input has not arrived or an input is in the process of arriving but the value has not been updated.

I believe this test can be made to pass reliably by introducing STP offsets, as done in the draft PR #2049, but I'm concerned that that may mask a real bug, which could be a race condition involving when inputs are deemed to be present and when their value is updated.

edwardalee commented 11 months ago

I believe this is addressed by #2118.