osrf / uctf

Unmanned Capture the Flag (U-CTF) project.
Apache License 2.0
24 stars 10 forks source link

arducopter instability under high load in SITL #46

Closed tfoote closed 8 years ago

tfoote commented 8 years ago

The instability of the arducopters when run in parallel is caused by dropped packets in the ardupilot/arducopter lockstepping with gazebo.

As a test I inserted the following code to drain the incoming socket after each read.

After this line: https://bitbucket.org/osrf/gazebo/src/ce43f7d724b2793f4590c89ba1c3717d59147715/plugins/ArduCopterPlugin.cc?at=ardupilot&fileviewer=file-view-default#ArduCopterPlugin.cc-635

  //Drain the socket in the case we're backed up
  int counter = 0;
  ServoPacket last_pkt;
  ssize_t recvSize_last = 1;
  while (true)
  {
    // last_pkt = pkt;
    recvSize_last =
      this->dataPtr->socket_in.Recv(&last_pkt, sizeof(ServoPacket), 0ul);
    if (recvSize_last == -1)
    {
      break;
    }
    counter++;
  }
  // pkt = last_pkt;
  if (counter > 0)
  {
    gzerr << "Drained n packets: " << counter << std::endl;
  }

The copters hovered much more steadily with the test code inserted.

The debug prints reveal the occasional lost packet under heavy load (~8 quads + 2 planes on my laptop)

(1479183298 216651978) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479183298 216790450) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479183298 216876385) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479183298 216954805) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479183298 217011632) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479183298 217069452) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479183298 217136871) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479183298 217199720) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479183298 217254651) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479183298 219997253) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1

Here's a larger sample: https://gist.github.com/tfoote/48655af49d45dbbe16ad37ba6867888e

There were even some draining going on at startup so I think this was effecting a single drone, but not enough that it made the control system noticeably unstable.

Here's an example of starting 2 copters and 2 planes. Since the backup never cleared it was basically initializing the control loop with lag.

(1479181120 213888499) [Msg] Waiting for master.
(1479181120 222805313) [Msg] Connected to gazebo master @ http://127.0.0.1:11345
(1479181120 223032149) [Msg] Publicized address: 172.23.3.99
(1479181121 54762761) Init world[default]
(1479181134 616642643) ArduCopter ready to fly. The force will be with you
(1479181134 922151273) ArduCopter ready to fly. The force will be with you
(1479181135 204566755) applying Gaussian noise model with mean 0, stddev 1, bias -2.22179
(1479181135 204629758) applying Gaussian noise model with mean 0, stddev 1, bias -3.56046
(1479181135 204657866) applying Gaussian noise model with mean 0, stddev 1, bias -2.47102
(1479181135 204680447) applying Gaussian noise model with mean 0, stddev 0.1, bias 0.0721734
(1479181135 204703947) applying Gaussian noise model with mean 0, stddev 0.1, bias -0.0967225
(1479181135 204726990) applying Gaussian noise model with mean 0, stddev 0.2, bias 0.15346
(1479181135 407365291) ArduPilot ready to fly. The force will be with you
(1479181135 708236548) applying Gaussian noise model with mean 0, stddev 1, bias -4.05091
(1479181135 708291516) applying Gaussian noise model with mean 0, stddev 1, bias -3.04168
(1479181135 708324748) applying Gaussian noise model with mean 0, stddev 1, bias 2.97222
(1479181135 708361503) applying Gaussian noise model with mean 0, stddev 0.1, bias -0.0247391
(1479181135 708394917) applying Gaussian noise model with mean 0, stddev 0.1, bias -0.247172
(1479181135 708427464) applying Gaussian noise model with mean 0, stddev 0.2, bias 0.000792821
(1479181138 23442351) ArduPilot ready to fly. The force will be with you
(1479181138 23586404) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 14
(1479181138 23631767) [Dbg] [ArduCopterPlugin.cc:691] ArduCopter controller online detected.
(1479181138 23809216) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 15
(1479181138 23843144) [Dbg] [ArduCopterPlugin.cc:691] ArduCopter controller online detected.
(1479181138 23923387) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 14
(1479181138 23947203) [Dbg] [ArduPilotPlugin.cc:835] ArduPilot controller online detected.
(1479181138 185618927) [Dbg] [ArduPilotPlugin.cc:835] ArduPilot controller online detected.
(1479181143 435622269) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 2
(1479181143 436565994) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 2
(1479181143 436692039) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 2
(1479181148 555028213) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 4
(1479181148 556121556) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 4
(1479181148 556236643) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 4
(1479181154 810127396) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 2
(1479181154 810230919) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 2
(1479181154 810296193) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 2
(1479181174 137816374) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 2
(1479181174 138844251) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 2
(1479181174 139069105) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 2
(1479181196 483793217) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479181196 483887895) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479181196 484827374) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479181217 391511949) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 3
(1479181217 391600902) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 3
(1479181217 392386184) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 3
(1479181241 28813740) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479181241 29062132) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479181241 29145791) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479181455 55563714) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479181455 55676231) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479181455 55774281) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479181461 706403371) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1
(1479181461 707358518) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479181461 707455626) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479181468 518175099) [Err] [ArduCopterPlugin.cc:655] Drained n packets: 1
(1479181468 518272927) [Err] [ArduPilotPlugin.cc:794] Drained n packets: 1

Each of these sockets when backed up contributes to phase lag in the feedback loop. And over time the number of backed up packets accumulates, increasing the lag at each drop.

With this basic workaround the vehicles flew much more steadily. At higher loads more packets are dropped, and the effect accumulates faster. And that's also why the efffect gets worse over time.

Conceptually it may make more sense for the flow of data to be triggered from the gazebo side at each update cycle instead of having the ardupilot continuously pushing updates. If a packet is lost the simulation can keep running and use the last known command or we could actually pass unique id's or sequence numbers to keep ourselves in lockstep.

The matching code segment is here: https://github.com/tfoote/ardupilot/blob/uctf-dev/libraries/SITL/SIM_Gazebo.cpp#L64-L70

This patch is enough to diagnose the problem, however I think that there are better ways to fix this.

This is also an issue in ArduPilotPlugin.cc: https://bitbucket.org/osrf/gazebo/src/ce43f7d724b2793f4590c89ba1c3717d59147715/plugins/ArduPilotPlugin.cc?at=ardupilot&fileviewer=file-view-default#ArduPilotPlugin.cc-773 However the control loop is more robust and the performance hit is not as observable.

@khancyr @tridge do you have any opinions about how this might be resolved best?

@gerkey FYI

tfoote commented 8 years ago

Fixed here: https://bitbucket.org/osrf/gazebo/commits/2a83c4071abc3e0f787126b6cacbab707fd6a3f6?at=default