triffid / FiveD_on_Arduino

Rewrite of reprap mendel firmware
http://forums.reprap.org/read.php?147,33082
GNU General Public License v2.0
30 stars 12 forks source link

Clash between stepper timer and serial communication #3

Closed Traumflug closed 13 years ago

Traumflug commented 13 years ago

Recently I started to run larger GCode sections and have run several hundred lines successfully. The unfortunate truth is however, in some rare circumstances FiveD on Arduino drops steps. If the drops happen in the middle of a longer movement, you can hear these drops, as the sound coming from the stepper motors sounds a lot rougher than normal. Sometimes one axis even stops entirely for a second or two, just to continue the same movement as if nothing happened. Again, you can hear the difference easily and the pause in movement isn't a pause in the controller sending something to the stepper drivers.

Once I've found a sequence exposing the problem, I could repeat it reliably.

One part of the diagnosis is, if you send these GCode sequences line by line to the controller, everything runs fine.

To diagnose further, I was even brave enough to add some debug messages into dda_step(), below line 470:

if (dda->step_no % 100 == 0) {
    serwrite_uint32(dda->c);
    serial_writechar(' ');
}

With ACCELERATION_RAMPING turned on (to enable the used variables), this nicely outputs the current speed at a rate the serial channel can follow easily.

However, each time a number is sent, you hear a little knock from the stepper motors. Eventually, the controller (168' Arduino) hangs up entirely, it's LED flickers and the serial connection is no longer responsive. Pressing the reset button isn't sufficient, I have to power-cycle the thing to get it back to life.

My current conclusion is, steps get lost in a movement if serial communications is busy at the same time. These two timers don't work independently from each other, but get into each other's way.

Undoubtly, I'd like to reduce serial communication's interrupt priority, but how would one achieve that?

BTW., most movements run at about 4000 steps/second.

triffid commented 13 years ago

serial doesn't use a timer- an interrupt fires every time a character is moved from the character buffer to the send buffer, so the interrupt can move the next character into the character buffer. This move should be /very/ fast due to the simplicity of ringbuffers. Only re-writing in hand-crafted assembler could make it much faster I think.

The step interrupt takes far longer than I'm happy with- I suspect that when delayed slightly by the serial tx interrupt, it may miss its timeout and have to wait for the timer to loop all the way back again which at some of the higher prescale settings can easily be a couple of seconds. The changes I'm playing with in the united_timer branch aim to improve or solve this.

The led turns on at the start of the step interrupt and off at the end- flickering led means /something/ is happening- maybe the timer is re-firing as fast as it can for some reason? That would cause the serial buffer to become full as the main loop can't empty it. Another possibility is that it's busy-looping in interrupt context instead of dropping new characters- interrupts need to be working for the tx ringbuffer to empty, so sending chars from interrupt context could cause trouble. There is code to avoid this in place, but it may not be working correctly. I used to send exclamation marks from the step interrupt in some of my debug sessions and the dropping seemed to work then, but maybe it's changed since.

Please post all the info necessary to repeat so we can all help debug! I've come across similar stuff from time to time but have never been able to get it repeatable.

Traumflug commented 13 years ago

This is the GCode exposing the problem:

G21
(Absolute Coordinates)
G90
M05
G00 Z50.0000
G00 X0.0000 Y0.0000
M06 T01  ; 0.7000
G00 Z1.0000
M03
G04 P0.000000
G00 X-24.7650 Y13.9700
G01 Z-1.2000 F100.00
G00 Z1.0000
G00 X-31.7500 Y13.9700
G01 Z-1.2000 F100.00
G00 Z1.0000
M05

The G0 in line 8 runs very slowly and drops about 30 mm worth of steps in the middle. The remaining lines are needed to expose misbehaviour though, likely so keep serial communications busy for a while.

The relevant section from config.h:

#define STEPS_PER_MM_X      320
#define STEPS_PER_MM_Y      320
#define STEPS_PER_MM_Z      320
#define STEPS_PER_MM_E      320

#define MAXIMUM_FEEDRATE_X  800
#define MAXIMUM_FEEDRATE_Y  800
#define MAXIMUM_FEEDRATE_Z  800
#define MAXIMUM_FEEDRATE_E  800

#define SEARCH_FEEDRATE_X   50
#define SEARCH_FEEDRATE_Y   50
#define SEARCH_FEEDRATE_Z   50
#define SEARCH_FEEDRATE_E   50

//#define ACCELERATION_REPRAP
#define ACCELERATION_RAMPING
#define ACCELERATION_STEEPNESS  200000
Traumflug commented 13 years ago

Commenting out G4 (dwell) handling in gcode.c avoids the issue, so I'm a lot closer to the fix. Writing bug reports is sometimes really helpful :-)

Traumflug commented 13 years ago

To be more precisely, it's this waiting loop in gcode.c which disturbs the currently running command and drops stuff coming in over the serial line:

// wait for all moves to complete
for (;queue_empty() == 0;)
    wd_reset();

Right now I'm out of ideas why not returning to the main loop makes things running havoc. I'd like to understand this before proceeding to implement Dwell as a queued command similar to waiting for temperature. Especially, as wd_reset() currently is a no-op.

Traumflug commented 13 years ago

Sometimes the only way to get forward is to randomly add or delete code ... fortunately, this gave enough hints for doing things properly.

One part of the problem was, if you don't read characters coming in over the serial line, the interrupt logic will swamp you with retry-interrupts. See the fix in http://github.com/triffid/FiveD_on_Arduino/commit/fb53c2c0a83761506f446dd3602e7d8fbad25ec1 .

Another part is, switching the timer interrupt on and off all the time while stepping obviously makes the timer vulnerable and, in our case, leads to distorted timings. So I changed the code to turn on the interrupt starting with the first move and switching it off no earlier than the last move waiting in the queue is done. In my tests this runs nicely, and even when getting the interrupt-swamp mentioned above, stepping gets slow, but still without step losses. See http://github.com/triffid/FiveD_on_Arduino/commit/95939ecc229a2a3f71917bb67cfd7ee027ec1e06 .

The third thing is, XON/XOFF flow control still doesn't work for G4 Dwell. I'll have to investigate this further, no fix yet.

Traumflug commented 13 years ago

XON/XOFF flow control now moved into serial.c, making it reliable in all situations. Issue solved :-)