repetier / Repetier-Firmware

Firmware for Arduino based RepRap 3D printer.
813 stars 734 forks source link

"Apos x steps" & "Apos y steps" — motion.cpp #903

Open EliW opened 4 years ago

EliW commented 4 years ago

Hey there, I'm sure that this is less an issue with the code, and more an issue that my current machine is having. But I'd like to understand this section of code better to figure out the combination of the two.

So I'm running a delta (Rostock Max V3), with dual extrusion, that's running Repetier Firmware. Recently when doing dual-extrusion, I've been having my machine give one of these two errors from motion.cpp immediately following a tool change/retraction. (Randomly during which one, but usually within the first 1-4).

Can someone help me understand exactly what this section of code is doing that is throwing this error and resetting my machine? https://github.com/repetier/Repetier-Firmware/blob/master/src/ArduinoDUE/Repetier/motion.cpp#L1336

Obviously (and from comments) it appears to be trying to determine the A tower's height (in steps?), in the process of translating cartesian coordinates that the gcode contains, into delta coordinates.

repetier commented 4 years ago

Quite easy if you understand the math and look into this code

temp = RMath::absLong(Printer::deltaAPosXSteps - cartesianPosSteps[X_AXIS]);
if (temp > LIMIT)
            RETURN_0("Apos x steps ");

It takes distance between left column and x in steps. It compares that value to LIMIT to see if the later square can be computed. If not it sends the error message. So what happens is that you want to go to a x position where the position can not be computed in steps any more.

So you have 2 choices if we assume that would be a legal position here. You can reduce steps per mm by factor 2 and also reducing microsteps. Then I guess position would fit.

Other solution is to tell firmware it is a large machine so it adjusts math to a slower version but one that can do computation. It is set automatically in printer.cpp

    if (deltaDiagonalStepsSquaredA.l > 65534 || 2 * radius0 * axisStepsPerMM[Z_AXIS] > 65534) {
        setLargeMachine(true);
#ifdef SUPPORT_64_BIT_MATH
        deltaDiagonalStepsSquaredA.L = RMath::sqr(static_cast<uint64_t>(deltaDiagonalStepsSquaredA.l));
        deltaDiagonalStepsSquaredB.L = RMath::sqr(static_cast<uint64_t>(deltaDiagonalStepsSquaredB.l));
        deltaDiagonalStepsSquaredC.L = RMath::sqr(static_cast<uint64_t>(deltaDiagonalStepsSquaredC.l));
#else
        deltaDiagonalStepsSquaredA.f = RMath::sqr(static_cast<float>(deltaDiagonalStepsSquaredA.l));
        deltaDiagonalStepsSquaredB.f = RMath::sqr(static_cast<float>(deltaDiagonalStepsSquaredB.l));
        deltaDiagonalStepsSquaredC.f = RMath::sqr(static_cast<float>(deltaDiagonalStepsSquaredC.l));
#endif

but somehow you managed to pass this test as small and still get out of bounds. So you might be on the edge here or position is illegal. After all when you change tools a new offset is applied moving the difference to old offset. And that is when it happens so maybe check first the offsets - but remember they are in steps in eeprom. Also make sure they are measured from delta carriage center!

EliW commented 4 years ago

Thanks for the details!

The issue I'm running into, is where there shouldn't be any illegal positions, nor should there have been any changes (dual printing worked, now suddenly it's not).

So I looked at the numbers for my machine, and deltaAPosXSteps 'should' be being calculated to be 11496.937 ... Given that, it shouldn't be possible at all for the (temp > LIMIT) to be over 65k ... unless we are getting an overflow/rollover issue, and the cartesianPosSteps that it's being told to calculate from, is greater than that. (IE: Telling it to literally leave the bed). So something is going on. I may need to add some debugging statements there, that if we detects that error case, to have it output to my Octopi a list of all it's calculations at that time. To see what the heck has happened.

After all when you change tools a new offset is applied moving the difference to old offset. And that is when it happens so maybe check first the offsets - but remember they are in steps in eeprom. Also make sure they are measured from delta carriage center!

This part is interesting, because, in fact, this error is happening now only when the tool change happens. In my case, I don't have any offset, because it's not a separate nozzle, but a Y-adapter, so it's retracting the first filament (significantly), and then pushing the new filament down through the Y.

So perhaps something is getting 'mucked up' during the change. Looks like this happens in Extruder::selectExtruderById. Yes? But as long as the offset is set to 0, the result should be 0. (Unless there's some really weird -0*80 !== 0 kinda chip error going on. Hrmmm.

Or if somehow the EEPROM is just getting corrupted.

repetier commented 4 years ago

With 11496.937 that really does not qualify for an overflow even if you are at the other end of bed. And yes, in your case offset x and y in eeprom should be 0 for both extruders so check eeprom if it changed somehow. Would at leats make sense. Otherwise debug messages writing the 2 positions for x to log might help getting an idea how "big" the problem is to guess what it is. But really sounds like selectExtruderById is causing this while it should not do.

EliW commented 4 years ago

Here's what the data looks like when the error happens -- All the steps are negative.

Recv: Info:Relative positioning Send: N4855 G1 X-9.632 Y2.381 F6000*94 Recv: Info:Absolute positioning Recv: SpeedMultiply:100 Recv: SpeedMultiply:100 Recv: Info:Relative positioning Recv: deltaAPosXSteps: -10004 Recv: cartesianPosSteps: -83197 Recv: temp: 73193 Recv: limit: 65534 Recv: Error:Apos x steps Changing monitoring state from "Printing" to "Error: Apos x steps"

repetier commented 4 years ago

So assuming z steps per mm = 80 it want to move to x = -1039.9625mm. Not likely that this is a regular correct move. If you have checked eeprom (you did not mention if you did), you can compile with DEBUG_QUEUE_MOVE defined to get more output about added moves. Then add some more output in selectExtruderById before and after potential moves to see where it comes from. At some point you should see a big distance being added - then you see from the marker which move does that and can follow where the coordinate did came from.

EliW commented 4 years ago

So is there something that I should know about sending debug messages out?

I've been using Com::printFLN() … and I'm getting weird effects. It works, but sometimes it seems that by adding a message like that, the machine will lock up after sending it. If I remove that message, suddenly it's working again. And only in some places, some lines. Starting to drive me bonkers really. Hard to debug when putting in debug messages breaks the code.

repetier commented 4 years ago

There is a macro DEBUG_MSG2_FAST("message", variable); DEBUG_MSG_FAST("message"); that only write when debug echo flag is set (M111 S7) Too many messages and slow down anything - this is surely not meant to be done during a print. Just replay the problem code. Also adding writes inside interrupt timer functions is a good way to crash firmware and interrupt started outputs. Do that only very sparsely when you have no other outputs and still expect it to crash every now and then when buffer gets full while interrupts are disabled preventing code from being send.

The queue function is in main thread so just slowdown of operation.

EliW commented 4 years ago

Hey @repetier - I'll fully admit I'm in over my head here (I'm a dev, but never done C/embedded code stuff before - A bit of desktop - tons of web applications). So thanks for bearing with me ... But when you get into 'inside an interuppt timer' my response becomes "Right, so how do I know when I'm in an interrupt timer?" Anyway, some updates:

So, first of all, I enabled the DEBUG_QUEUE_MOVE as you suggested. When I do that, I get a ton of output (obviously), the interesting point is that I don't get the Apos error anymore, because the machine locks up / resets itself before that. Here's a sample end of output:

Send: N2288 G1 X4.172 Y-6.020 F6000*80
Recv: ID:5234
Recv: vStart/End:924/924
Recv: accel/decel steps:1/1/1108
Recv: st./end speed:10.0/10.0
Recv: Flags:146
Recv: joinFlags:5
Recv: ID:5234
Recv: Delta 0 0 0 1108
Recv: Dir:128
Recv: Flags:18
Recv: fullSpeed:10.00
Recv: vMax:924
Recv: Acceleration:155887.43
Recv: Acceleration Prim:600600
Recv: Remaining steps:1108
Recv: LimitInterval:17316
Recv: Move distance on the XYZ space:11.99
Recv: Commanded feedrate:10.00
Recv: Constant full speed move time:19186128.00
Recv: Echo:N2285 G1  E-12.0000 F600.00
Recv: ok
Send: N2289 G1 X3.496 Y-6.076*58
Recv: Info:Relative posit���start

After that 'start', the machine literally restarts itself like it was reset. I did this multiple times, same output. If I turn the debug off, I get the Apos error.

The interesting point here, is that it's the exact same area in the gcode that is causing this. Because what I believe that I'm seeing there, is the G1 E-12.0000 is the "please retract the filament, I'm going to change extruders" ... then the G1 X3.496 Y-6.076 ... is it telling the printer to move over to the purge block before It's going to start pushing the new filament down.

And that's exactly the point where the Apos happens each time. It retracts T0, and before it extrudes from T1, error.

Now what's EXTRA interesting (to me at least?) Is that I found that when I try to do a 'manual' extruder change, either directly on the LCD of the printer, or via sending a T1 ... the exact same thing happens, however instead of it giving an Apos error ... it appears to just 'lock up'. Machine says 'Extruder 1' on it ... and then becomes nonresponsive. Sometimes if I wait long enough ... it'll come back to me.

The combination of 'machine resetting' / 'machine locking up' / 'Apos with impossible move' ... Is making me wonder if these are all the exact same error just expressed in different ways.

If somehow, the attempt to switch to Tool 1 ... is causing the machine to lockup/reset itself. And in the case of live printing something, that ends up expressing itself as an Apos error. Because it gets the tool change command, it resets itself. But that final 'move' after T1 is in the 1-command-buffer that it keeps. So it then tries to execute that one command but thinking that it's homed. And so that crazy big move is because it's thinking it needs to move waaaaay down to get back to the print.

This might be confirmed also because one time in a test, the print continued past the 1st change, and then proceeded to make a few more changes. Then on one change suddenly I got 'A hit floor' error because it did. It did a Tool change, and then WHAMM tried to go down into the print.

... The question therefore if all this is correct ... Is why has my printer 'suddenly' decided to start resetting itself / locking up ... 90% of the time that an extruder tool change happens?

repetier commented 4 years ago

That debug option normally does not reset printer. Especially if only a few commands get send. That makes me think that you might run out of memory which can cause also strange errors of any kind. When you connect/reset you see free ram. On due that should at least be 2000 byte to get no stack overflow. With enough subsegments/moves stored you can fill any memory so even on due that might be the reason.

EliW commented 4 years ago

@repetier I believe that you got to the heart of the matter. Turns out that with Dual Extruder support compiled, I only had 668 bytes of free RAM. Another owner of this printer had been complaining about different issues with dual extrusion ... and his solution was to disable out the SD Card support. I did that and recompiled, and have 1520 bytes of free RAM. Still not the 2000 you are suggesting. But now the prints that were failing, are working.

EliW commented 4 years ago

I'm theorizing that why it 'worked before' and now stopped. Is that my previous dual color prints were larger and less intricate. What I'm doing now are small quarter sized tokens, with lots of details. So the gcode is very complicated with tons of twists & turns.

repetier commented 4 years ago

668 free bytes is not much. Could it be that you are not using the due based version since due has 96000 byte ram while avr mega only has 8192 byte. As I said if you reduce number of subsegments you get also more free ram so no need to disable sd card function also that is another way to get more ram. Same with disabling display.

On 8 bit the free ram is somewhere between 900 and 1000 byte. I doubled it for due as it is 32 bit so some vars are longer.

Complexity of gcode is not an issue since firmware does not see it. Firmware only sees the next 16 lines or so. But different conditions need different stack size so that error is a bit random if you still have enough free ram for most conditions.