Setting M425 to non-zero values causes Layer Shift

schmttc commented 2 years ago

When two M425 commands are placed in Cura > Machine > Start G-Code, prints have a consistent layer shift. The layer shift occurs at different points depending on the version and configuration of Marlin 2.x firmware installed.

Example: G28 ;Home G1 Z15.0 F600 ;Move the platform down 15mm ;Prime the extruder M425 X0.39 Y0.39 Z0.2 M425 F1 S0 G92 E0 G1 F200 E3 G92 E0

This required more diagnosis to resolve. Suggest setting backlash offsets outside of printing, and save to EEPROM.

schmttc commented 2 years ago

M425 is enabled at 0,0,0 F0 S0 in firmware

Test 1, Left in image First line of Cura start block: M425 X0.39 Y0.39 Z0.2 F1 S0

Test 2, Right in image Save settings to EEPROM, confirm over reboot Remove M425 from Cura start block shift

Layer shift exists in both, but presents differently. I'm beginning to suspect this is an issue with the backlash compensation module in Marlin. IIRC I didn't have these issues when I compiled the backlash configuration into the firmware. This requires further testing.

schmttc commented 2 years ago

For comparison, here is a print using the same firmware, with M425 set to X0 Y0 Z0 F0 S0 noBacklashComp

schmttc commented 2 years ago

Confirmed that compiling the firmware with backlash offsets does not cause layer shift Test prints using this tower: https://www.thingiverse.com/thing:2371069 Lines mark the 'zero' point, which should align to the middle thick layer when backlash is zero.

Pink: 0 backlash compensation Orange: 0.39mm compensation XYZ Blue: 0.24 compenstaion XYZ backlashTowers I'm assuming the shift in the orange tower is a mechanical coincidence

Cura Profile and (untested) 0.2mm backlash binary attached for those wanting a quick fix, this should be 'close enough'. Remane the files and remove the ".txt" K7.temptower.curaprofile.txt mksLite.backlash0.20.bin.txt

schmttc commented 2 years ago

It turns out the layer shift in the orange tower was no coincidence. Another user is having layer shift issues even when hard coding the backlash values. Longer prints, or more layers, or more segments seems to trigger the layer shift, which leads me to believe it is some sort of overflow error.

0.3 layers, 0.5 0.36 backlash, smoothing 0, correction 1. No layer shift apparent 0 3

0.2 layers, 0.5 0.36 backlash, smoothing 0, correction 1. Layer shift begins at top of tower 0 2

Testing is required with the latest mainline bugfix version before reporting the issue.

schmttc commented 2 years ago

Latest Marlin bugfix-2.0.x (2.0.9.3) Compiled backlash 0.5x0.5,0; correction 1, smoothing 0 Layer shift still present 2903

crazy-hair commented 2 years ago

Interestingly enough, when I try to print with backlash compensation on, it seems to layer shift every layer, not just sporadically. The shifts still aren't consistent, which means I can't use a post-processing script to fix it,though. PXL_20220930_050228934

real-artswan commented 1 year ago

I've managed to compile vanila bugfix-2.1.x* with your fixes for our printer (can do a PR if you want). backlash compensation doesn't hurt anymore and I can print (almost) perfect circles. I've always had a pattern "squared circle" because of a huge backlash on the weak mechanics IMG_20230314_225003

now with a M425 F1.00 S0.00 X0.6 Y0.9 Z0.20 I've got perfect rounds

IMG_20230314_224805 IMG_20230314_224903 IMG_20230314_224729

now I've got another problem, after a several layers of a bigger object it makes sound like "beep" (don't know, it has no buzzer, so pwm may be... it seems it happens on a layer change) and stops and reboot without any messages about a reason in a serial console. Could it be a buffer overflow issue? O rather a PWM issue, can you advice how to fix/tune? IMG_20230315_005633

may be I just should try some more stable branch...

schmttc commented 1 year ago

I wonder if the crash is related back to the original issue; I found I had to print a large enough item (I suspected at the time there was a minimum number of segments or moves or gcode steps) before the layer shifts started. As you noted the 'beep' could be the stepper motor trying to do a large/fast move that it's physically incapable of; pure speculation but maybe it draws too much current and browns out the board? Could try setting lower current limits in the configuration files and retest.

I started a 2.1.2 branch here https://github.com/schmttc/EasyThreeD-K7-STM32/tree/2.1.2 which incorporates the maple compile bugfix https://github.com/MarlinFirmware/Marlin/issues/25346 It compiles and flashes but caveat I haven't tested it yet as the K7 isn't my main printer any more. Maybe see if that fixes the large print issue. Otherwise I'd suggest raising a ticket in mainline.

Sidenotes

I'm very interested to see how Linear Advance works on the little printer
I have a to-do item to roll the updates in this repo back into mainline Marlin

real-artswan commented 1 year ago

Can say that the backlash issue is irrelevant to my problem and it works well, at least me can't reproduce with the latest bugfix-2.1.x branch, I can make a PR, to merge it, should I? It seems my problem is different, it's definitely about the extruder's stepper, it hangs and PWM is audible, at that time the printer still sends status to serial, but doesn't respond neither accepts commands anymore. My thoughts are:

when it gets too LOW current (or too low PWM freq, don't know)
when it's asked for a too high aсceleration from the very rest
Enabling "Linear Advance" just calls for the hanging problem sooner, the bigger K value the sooner it hangs (I think because it shapes speed/acceleration, does it make it slower in the corners?)

so it might be a power source issue, or might be some steppers physical limit, it also could be an issue in the Marlin code (or my config). I tried to set DEFAULT_MINIMUMFEEDRATE to 20, but it didn't affect at all. At the end of the day, I want more or less precise circles but printers mechanic is weak so I need backlash compensation and I need to lower all accelerations and speeds (and expectations;)). And finally, I've also disabled linear advance (btw, on the photo above it was 0.6) and added retraction 1mm and seems got balance between speed and stability and precision and quality. Wondering to try it with more power, but max current on a stepper could also be a driver limit, (need to look at the board, which extruder stepper driver it has)

schmttc commented 1 year ago

Happy for a PR, but it would be best to solve the large print hanging issue first.

Does the issue occur with the 2.9.1 firmware?
Does the issue occur when backlash compensation is disabled in firmware?

real-artswan commented 1 year ago

to me, backlash compensation works well and helps a lot

schmttc commented 1 year ago

It looks like the crash and noise might be due to a bug in linear advance https://github.com/MarlinFirmware/Marlin/issues/25553 @real-artswan please do a PR and we can get a few people using it

schmttc commented 1 year ago

The backlash problem still exists in Marlin 2.1.2, but it seems to be less of an issue.

The backlash tower printed fully 212 tower

But printing the two ducts from https://www.thingiverse.com/thing:4931953 shifted half way 212 duct

thinkyhead commented 1 year ago

Thanks for all the images. I'm convinced the issue is real.

may be I just should try some more stable branch.

Yes, actually there is a straightforward procedure that can help to isolate the exact cause of the issue, if you would like to help us track it down, but it's a little meticulous.

The strategy is to first find a commit from some point in the "distant" past where the feature works perfectly. Then, using the bugfix-2.1.x branch you would test a commit from halfway between that date and today…. And then you keep going to the commit half-way in between your "known to work" commit and your "known to be broken" commit until you find the exact commit that broke the feature.

Even if you started from a point 5000 commits in the past, it would take no more than 12 or 13 tests to find the exact commit that broke it.

This can be done with git bisect but we don't expect the average user to use that method, plus it doesn't account for configuration changes between versions. So, it is best just done manually. It is best for someone who is experiencing the issue to do this procedure, since it may not be straightforward for someone else to reproduce the issue.

The first trick is to find some older version that doesn't exhibit the issue, and then you can start testing from that point in time.

thinkyhead commented 1 year ago

@tombrazier — Can you take a peek at some changes to backlash compensation, starting from 87c4cd20e5 (#23814) and particularly at the Backlash::applied_steps method? I note that there is a bool called reversing (now called reverse) that simply tests Backlash::last_direction_bits to see if an axis was moving "backward" at an earlier point. I haven't looked very closely at the complete backlash logic, but I wonder if that should instead be looking for some change in the direction bits, rather than just checking whether the last direction was set to 1, indicating that the axis was moving in a particular direction. We can go over Backlash::applied_steps together in Discord soon so that I understand it, and backlash compensation in-general, and make sure all changes since last year are sound.

schmttc commented 1 year ago

Thanks for all the images. I'm convinced the issue is real.

may be I just should try some more stable branch.

Yes, actually there is a straightforward procedure that can help to isolate the exact cause of the issue, if you would like to help us track it down, but it's a little meticulous.

The strategy is to first find a commit from some point in the "distant" past where the feature works perfectly. Then, using the bugfix-2.1.x branch you would test a commit from halfway between that date and today…. And then you keep going to the commit half-way in between your "known to work" commit and your "known to be broken" commit until you find the exact commit that broke the feature.

Even if you started from a point 5000 commits in the past, it would take no more than 12 or 13 tests to find the exact commit that broke it.

This can be done with git bisect but we don't expect the average user to use that method, plus it doesn't account for configuration changes between versions. So, it is best just done manually. It is best for someone who is experiencing the issue to do this procedure, since it may not be straightforward for someone else to reproduce the issue.

The first trick is to find some older version that doesn't exhibit the issue, and then you can start testing from that point in time.

You read my mind, I was just planning to do this, ellensp ran me through the process previously for a different issue. Based on my research this firmware (https://www.thingiverse.com/thing:4499386) at 2.0.5 supported backlash compensation and there were no issue reports, so I'll start there when I get some spare time.

tombrazier commented 1 year ago

Can you take a peek at some changes to backlash compensation, starting from https://github.com/schmttc/EasyThreeD-K7-STM32/commit/87c4cd20e514b9fc94450e9503e268cc014490e8 (#23814) and particularly at the Backlash::applied_steps method?

I can but not immediately. It is so long since I worked on that code I can't even remember what I fixed!

You read my mind, I was just planning to do this, ellensp ran me through the process previously for a different issue. Based on my research this firmware (https://www.thingiverse.com/thing:4499386) at 2.0.5 supported backlash compensation and there were no issue reports, so I'll start there when I get some spare time.

Excellent.

tombrazier commented 1 year ago

After some reflection...

I wonder whether the bug is due to the backlash value being wrong or just because backlash compensation itself causes movement that is more likely to result in a layer shift. This could be tested by checking whether layer shifts still occur with lower speed/acceleration or with input shaping enabled.

My guess is that it is more likely to be the second reason because if it was built in to the backlash calculations then I'd expect the same layer shift in the same place every time something was printed with the same parameters.

One place to look will be calculations in planner.cpp that depend on steps_per_mm. This variable can end up with a rather odd value if a backlash is loaded onto a really short segment.

thinkyhead commented 1 year ago

backlash compensation itself causes movement that is more likely to result in a layer shift

My assumption is that it just adds more length to any axis move that is changing direction, but of course when more than one axis is moving, it can't just lengthen the whole move; it has to do the extra correction steps only on the affected axis, and then continue the move with the combined axes. So it could certainly lead to the other (non-compensated) axis having a sudden stop for a small period — enough to induce a layer shift.

Other things to consider in relation to layer shift include the motor current settings (for TMC and some other stepper drivers) and how those may have been adjusted between versions, and general acceleration settings. So it may help to experiment with some of those to see if the problem can be mitigated.

And finally, in the course of testing be sure to disable various optional features including things like Junction Deviation (enable Classic Jerk to disable JD), Linear Advance, Firmware Retraction, S-Curve Acceleration, and anything else in the realm of motion. This can help determine that the problem is specific to Backlash or whether it is caused by weird interactions with other features.

I look forward to finding out where the root cause lies! Backlash Compensation isn't the most popular feature, but it's a pretty important one, especially for machines beyond 3D printers that have more play in their mechanical components.

schmttc commented 1 year ago

Well I havn't got to the bottom of the issue, but here's what I've done so far.

TLDR; The workaround is to try 'unreasonably low' acceleration values when layer shift occurs. But the failure mode looks like a combination of higher acceleration and backlash compensation being enabled, that I don't understand yet.

lashtest

A: 2.0.5 firmware, backlash correction enabled 0.2/0.36/0/1, 2.0 ejerk, 100mm/s acceleration, print speed 20, travel speed 40 - fail B: 2.0.5 firmware, backlash correction enabled 0.2/0.36/0/1, 2.0 ejerk, , 20mm/s acceleration, print speed 20, travel speed 40 - success C: 2.0.5 firmware, backlash correction enabled 0.2/0.36/0/1, 0.1 ejerk, 100mm/s acceleration, print speed 20, travel speed 40 - fail F: 2.0.5 firmware, backlash correction enabled 0.2/0.36/0/1, 2.0 ejerk, 100mm/s acceleration, print speed 20, travel speed 20 - fail

I need to rerun these tests, as 2.0.5 doesn't seem to be re-building reliably when I change configuration parameters, the old ones stick D: 2.0.5 firmware, backlash correction enabled 0.2/0.36/0/1, 0.1 classic jerk, 100mm/s acceleration, print speed 20, travel speed 40 - fail E: 2.0.5 firmware, backlash correction enabled 0/0/0/0, 2.0 jerk, 100mm/s acceleration - fail G: 2.0.5 firmware, backlash correction disabled, 2.0 jerk, 100mm/s acceleration, print speed 20, travel speed 20 - TBA

I don't know exactly how backlash correction and acceleration works, but my first suspicion went something like

Backlash correction adds 'microsegments' to correct for backlash
Due to the small segment size, acceleration for the segment quickly approaches the maximum allowed by the firmware
With backlash disabled, a regular length segment reaches max acceleration almost never, or the change in acceleration is slower
When a printer is allowed 'to much' acceleration, it can overrun, causing layer shift
So acceleration is the key limiter to the layer shift, but it is brought forward by enabling backlash correction.
Different firmware versions have slight variations in the motion engine, which is why the layer shift is not consistent across firmware
Different variables for compensation also give varying layer shift as they change the acceleration curve.
Watching the printer, I also feel like the shift is occurring on the diagonal travel at the end of the layer, which is why the shift is always in the same direction.
- Travel speed is double the wall speed, so more chance of getting out of control. Maybe need to target travel speed/acceleration

Most people are trying to ring speed out of their printers, and most printers ship with bad default values So when my little printer has a max accel of 300, I set 'slow' as 100, and thought I'd covered off the acceleration side of the problem So I would recommend for troubleshooting that people set an 'unreasonably low' acceleration of say 20mms, and see if the problem persists.

But the outlier question is why I could previously get layer shift when backlash comp was enabled, but set to zero. But on the same settings with backlash disabled, no layer shift. Maybe the extra calculation time allows overrun? Need to re-run this test to confirm.
If this is the case, can layer shift be caused on a different printer by enabling compensation and setting zero values?

tombrazier commented 1 year ago

2.0.5 is really old firmware and backlash has been worked on since then. What do you get with the latest bugfix?

schmttc commented 1 year ago

2.0.5 is really old firmware and backlash has been worked on since then. What do you get with the latest bugfix?

I was exploring 2.0.5 as a potential non-buggy version in the 'distant past'. 2.0.9.1-2.1.2 are all impacted. I suggest that everything in between would be too. I'll try the latest bugfix when I do some more testing, but expect it to be impacted as well.

I wonder whether the bug is due to the backlash value being wrong or just because backlash compensation itself causes movement that is more likely to result in a layer shift. This could be tested by checking whether layer shifts still occur with lower speed/acceleration or with input shaping enabled.

My current thinking is that this is correct - lowering acceleration to 20 removed the issue in 2.0.5. My next questions are 'why', and 'can it be remediated'.

schmttc commented 1 year ago

The issue is consistent in 2.1.2 bugfix.

212b

2.1.2 bugfix firmware, backlash correction enabled .2/.36/0/1, classic jerk, disable linear advance, 2.0 jerk, 100mm/s acceleration, print speed 40, travel speed 40 - fail
Same compile as 2, set M425 to 0/0/0/0 - OK
2.1.2 bugfix firmware, backlash correction enabled .2/.36/0/1, ejerk, linear advance, 2.0 jerk, 100mm/s acceleration, print speed 40, travel speed 40 - fail, sounds like the printhead movement is pulsing along a straight line
Same as 4, print and travel acceleration set to 20 - OK

thinkyhead commented 1 year ago

I'm still keen to understand what is going on here, but don't have any idea about a solution yet. There is definitely a problem there, so to get to the bottom of it we'll need to observe the code in situ and see what's going on under the hood. We have a Simulator which should make it easier to debug, so the best next step is to test out backlash compensation in the simulator within LLDB, log all the low level motion events, and look for oddities there.

tombrazier commented 1 year ago

I still wonder whether it is caused by adding backlash to small segments. @schmttc what happens if you define BACKLASH_SMOOTHING_MM?

tombrazier commented 1 year ago

Incidentally, 0.36mm seems like a pretty large backlash to me. After doing what I can to tighten up my belts, etc. I have 0.05mm for both my X and Y axes.

schmttc commented 1 year ago

Thanks for your patence.

I have tested backlash smoothing (I used examples of S3 and S1), which fixes the layer shift; but introduces stuttering and odd speed changes in various segments throughout the whole print
Based on the limited other mentions of backlash layer shift that I could find online plus these results, I concur with @tombrazier in the belief that the layer shift is caused by high acceleration being performed on very small segments.
As the acceleration limits are being respected, the issue is not in the firmware itself, but rather caused by acceleration being set too high relative to the capabilities of the hardware.
Hardware is the reason for the large backlash; 24BYJ48 is an 11.25 degree stepper with a 64:1 reduction to increase resolution. This gear train provides the backlash
I'm thinking it's something to do with the gear train that causes the layer shift - maybe the short sharp acceleration causes an over step, not enough time to decelerate?

My apologies for the accusations against backlash compensation, it has been an odd combination of symptoms.

tombrazier commented 1 year ago

As the acceleration limits are being respected, the issue is not in the firmware itself...

The problem actually is that the acceleration limits are not being respected and the issue is in the firmware. What happens is Marlin adds steps to the axis movement but leaves the distance in mm unchanged. It then limits acceleration using the distance in mm. For a short segment and a comparatively large backlash (which I think does not happen for most printers but obviously does in this case) the steps to mm ratio is effectively much higher for this one segment and so the acceleration limit in steps is exceeded even if the acceleration limit in mm is not.

Backlash smoothing should resolve this without causing the odd speed issues described. So there may be another bug there. I don't really have the capacity at present to look into this but I wonder whether a simple (but slightly expensive) solution would be to recalculate block->millimeters in Backlash::add_correction_steps().

@schmttc do you have the skill to experiment with this?

schmttc commented 1 year ago

Ah I misunderstood, thanks for the explanation. I'm happy to see if I can follow the code and have a tinker, no promises.

schmttc commented 1 year ago

Hi @tombrazier , I've had a few tries at recalculating block->millimeters at various points and in various ways, but all I've been able to do is cause the print head to freeze. This will need to go into the backlog for someone more skilled to look at.

schmttc commented 1 year ago

Hello again @tombrazier and @thinkyhead , This issue has been on my mind so I had another try, and I have created a limited (cartesian only so far) fix that is working for me on first test 1095541aefd0ccaa80b8b09d5606540a4a533017

I don't think I'll have time this week to expand the fix to other types of machine, but it is on my to do list.

Left: fixed code. Right: Original code. Same gcode https://github.com/schmttc/EasyThreeD-K7-STM32/blob/backlashfix/Backlashtower_0_20mm.gcode

thinkyhead commented 1 year ago

Your results are looking good @schmttc. We'll review your solution and see if it makes sense and has no side-effects.

Meanwhile, my initial suggested changes:

if (error_correction) {
  const uint32_t abs_correction = ABS(error_correction);                    // (1)
  block->steps[axis] += abs_correction;                                     // (2)
  backlash_distance_mm[axis] = abs_correction * planner.mm_per_step[axis];  // (3)
  block->millimeters += backlash_distance_mm[axis];                         // (4)

Explicitly pre-calculate the abs value.
As before.
Use pre-calculated reciprocal instead of dividing.
Remove SQRT(sq(...)) because SQRT(sq(x)) == x.

thinkyhead commented 1 year ago

Assuming the logic is correct overall, the only thing that I would change to improve accuracy would be to recalculate block->millimeters by adding the squares of all the axes together and taking the SQRT. I don't see any shortcut. Since this requires re-calculating the move distances of all the axes in the block in mm, it would optimize things slightly to cache those mm distances in the block at some earlier point (or even cache the squares of those distances).

tombrazier commented 1 year ago

I note that backlash_distance_mm is not declared anywhere but also that backlash_distance_mm[axis] could just be a local variable.

However if we are just fixing the cartesian case I think the problem would easily be solved without any extra calculation by moving the line

TERN_(BACKLASH_COMPENSATION, backlash.add_correction_steps(dist.a, dist.b, dist.c, dm, block));

before

block->millimeters = get_move_distance(displacement OPTARG(HAS_ROTATIONAL_AXES, cartesian_move));

in planner.cpp. i.e. move it to the start of the else { block.

tombrazier commented 1 year ago

Actually, I think block->millimeters is always just calculated from cartesian coordinates and kinematics is irrelevant. So all we really need is for Backlash::add_correction_steps() to return a bool indicating whether it changed anything. If it did then hints.millimeters (which is always just supplied as an optimisation) can be ignored. i.e. This will hopefully work:

  else {
    /**
     * At this point at least one of the axes has more steps than
     * MIN_STEPS_PER_SEGMENT, ensuring the segment won't get dropped as
     * zero-length. It's important to not apply corrections
     * to blocks that would get dropped!
     *
     * A correction function is permitted to add steps to an axis, it
     * should *never* remove steps!
     */
    const bool recalc_mm = TERN0(BACKLASH_COMPENSATION, backlash.add_correction_steps(dist.a, dist.b, dist.c, dm, block));

    if (!recalc_mm && hints.millimeters)
      block->millimeters = hints.millimeters;
    else {
      const xyze_pos_t displacement = LOGICAL_AXIS_ARRAY(
        dist_mm.e,
        #if ANY(CORE_IS_XY, MARKFORGED_XY, MARKFORGED_YX)
          dist_mm.head.x, dist_mm.head.y, dist_mm.z,
        #elif CORE_IS_XZ
          dist_mm.head.x, dist_mm.y, dist_mm.head.z,
        #elif CORE_IS_YZ
          dist_mm.x, dist_mm.head.y, dist_mm.head.z,
        #else
          dist_mm.x, dist_mm.y, dist_mm.z,
        #endif
        dist_mm.i, dist_mm.j, dist_mm.k,
        dist_mm.u, dist_mm.v, dist_mm.w
      );

      block->millimeters = get_move_distance(displacement OPTARG(HAS_ROTATIONAL_AXES, cartesian_move));
    }
  }

tombrazier commented 1 year ago

Oh, hang on, my suggested change won't fix the problem because it does not update dist_mm.

I also note that backlash compensation appears to know nothing about kinematics other than cartesian and variations on CORE / MARKFORGED. Also, it works in mm in cartesian space so really should not be applied in Planner::_populate_block() at all. Planner::buffer_segment() before the conversion to steps makes more sense. Or perhaps even Planner::buffer_line() before the kinematics logic.

thinkyhead commented 1 year ago

backlash compensation appears to know nothing about kinematics

True. I've always assumed it was looking at individual steppers and adding steps according to the setting for each stepper, rather than assuming the steppers are one-to-one (cartesian) mapped. This seemed the most logical assumption because the backlash code is looking at DIR changes, which come only after kinematics have been applied.

Since planner blocks are dealing directly with stepper steps (and indirectly in the case of Core kinematics) it seems the best place to apply backlash adjustments would be in the construction of these blocks — modifying them where possible rather than adding more of them. And, since backlash needs to be handled at the stepper level, it seems best to pre-calculate all the backlash distances in terms of steps and then use those in the backlash code.

It's probably not a complete rewrite, but we could get better and more optimized results by shuffling some things around.

tombrazier commented 1 year ago

I haven't checked for sure but I think the backlash config is interpreted as being in cartesian space for core type kinematics and in motor space for everything else (and in the case of cartesian printers the two are the same anyway).

I keep wanting to remove all the core code from all over the planner and stepper modules and put into into a core kinematics module the same as with delta, scara, etc. It would clean the code up a lot. But it's a biggish project and I don't have a core machine to test it on.

Anyway, coming back to the actual topic under discussion I'll have a look and see if I can move the backlash logic into Planner::buffer_segment() which works in motor space.

tombrazier commented 1 year ago

Hang on, I think there's a simple solution using calculus which works for any kinematics. If we assume the inverse kinematics is a linear function (which it is in most cases and is close enough for cases like delta) then a 2nd order Taylor polynomial can give us a good approximation to the change in block->millimeters.

For each axis, and working in mm, the difference added to block->millimeters is:

(dist_mm[axis] + 0.5 * error_correction_mm) * error_correction_mm / block->millimeters

[Edit: I really shouldn't attempt 2nd order differentiation of non-linear functions in my head. I missed out a -sq(dist_mm[axis]) * error_correction_mm / block->millimeters / sq(block->millimeters) term in the above expression. But linear is good enough so I have modified my branch.]

@schmttc I have coded this in this branch. Could you give it a go?

tombrazier commented 1 year ago

On further reflection, the linear approximation is least accurate when error_correction is large with respect to dist[axis]. This will tend to be the kind of segment where the layer shift bug is being triggered anyway. So I have also added some logic to restrict the error correction so that it does not get larger than dist[axis].

I don't think I will have a chance to test this latest change today.

schmttc commented 1 year ago

On further reflection, the linear approximation is least accurate when error_correction is large with respect to dist[axis]. This will tend to be the kind of segment where the layer shift bug is being triggered anyway. So I have also added some logic to restrict the error correction so that it does not get larger than dist[axis].

I don't think I will have a chance to test this latest change today.

For the latest https://github.com/tombrazier/Marlin/commit/80d20f0114a2f5e38ff3775799e461c4253a5c1a, the print head stops moving after lifting the Z axis as part of the homing routine at the start of the gcode. For reference, this is the same behaviour I meant to describe earlier when I said "all I've been able to do is cause the print head to freeze".

schmttc commented 1 year ago

Your results are looking good @schmttc. We'll review your solution and see if it makes sense and has no side-effects.

Meanwhile, my initial suggested changes:
if (error_correction) {
  const uint32_t abs_correction = ABS(error_correction);                    // (1)
  block->steps[axis] += abs_correction;                                     // (2)
  backlash_distance_mm[axis] = abs_correction * planner.mm_per_step[axis];  // (3)
  block->millimeters += backlash_distance_mm[axis];                         // (4)
Explicitly pre-calculate the abs value.

As before.

Use pre-calculated reciprocal instead of dividing.

Remove SQRT(sq(...)) because SQRT(sq(x)) == x.

I can confirm that this change still works

schmttc commented 1 year ago

Hang on, I think there's a simple solution using calculus which works for any kinematics. If we assume the inverse kinematics is a linear function (which it is in most cases and is close enough for cases like delta) then a 2nd order Taylor polynomial can give us a good approximation to the change in block->millimeters.

For each axis, and working in mm, the difference added to block->millimeters is:
(dist_mm[axis] + 0.5 * error_correction_mm) * error_correction_mm / block->millimeters
[Edit: I really shouldn't attempt 2nd order differentiation of non-linear functions in my head. I missed out a -sq(dist_mm[axis]) * error_correction_mm / block->millimeters / sq(block->millimeters) term in the above expression. But linear is good enough so I have modified my branch.]

@schmttc I have coded this in this branch. Could you give it a go?

Commit https://github.com/tombrazier/Marlin/commit/9ebf8d681fb50e8065044a90fa82af1463dd1850 gets past homing, but the print head stops in the first layer after the first two outer rings of the brim (repeated test twice with a reboot between)

tombrazier commented 1 year ago

For reference, this is the same behaviour I meant to describe earlier when I said "all I've been able to do is cause the print head to freeze".

I vaguely recall quite some time back that you also reported freezes when attempting to use BACKLASH_SMOOTHING_MM. So this might be the same thing again.

I can't think why these changes could cause freezes. Could you upload your current config and gcode and I'll try to replicate the freeze.

schmttc commented 1 year ago

Config and gcode attached. freeze.zip

Yes it didn't make sense to me either why it was behaving that way.

tombrazier commented 1 year ago

I could not replicate it on my printer. I will try on a Creality 4.2.2 board which has an STM32F processor. @schmttc to make it easier, could you let me know some more info:

Which marlin source code are image you using? I am guessing it is from your EasyThreeD-K7-STM32 fork, but which commit?
Does the bug happen if you do not home? (Setting up the Creality board with endstops on my printer would be a massive pain so I am hoping I can avoid it.)
Can you replicate the bug by removing all hotend heating gcode and disabling PREVENT_COLD_EXTRUSION? (Same reason: I ideally just want to use a bare board with no printer hardware attached.)

tombrazier commented 1 year ago

Oh, and can you replicate it with no display?

schmttc commented 1 year ago

@tombrazier I used your code branch as the source, which now that I think about it will have introduced a bunch of variables. For thinkyhead's tweak I used the "backlashfix" branch of the EasyThreeD-K7-STM32 fork.

I'll repeat the test modifying only the backlash code from my fork. And failing that, try the isolation step's you've suggested.

Interestingly, the K7 has no endstops either, it just runs home as the max bed value plus a bit, and lets everything crash to zero.

tombrazier commented 1 year ago

One thing I notice is you have #define BACKLASH_SMOOTHING_MM 0 in Configuration_adv.h and, equivalently, M425 S0 in your gcode. These will cause a divide by zero in the following code from Backlash::add_correction_steps():

        if (error_correction && smoothing_mm != 0) {
          // Take up a portion of the residual_error in this segment
          if (segment_proportion == 0) segment_proportion = _MIN(1.0f, block->millimeters / smoothing_mm);
          error_correction = CEIL(segment_proportion * error_correction);
        }
      #endif

AVR and ARM will likely respond differently to division by zero so that could be why I have not replicated the problem on AVR.

If not using BACKLASH_SMOOTHING_MM it should be commented out.

And maybe we need a sanity check for BACKLASH_SMOOTHING_MM == 0.

schmttc commented 1 year ago

The printer also has no display.

(with your code which I assume is based on the latest bugfix)

Comment out backlash smoothing in firmware - still freezes
Remove homing: G28 at start - still freezes
Remove G28, M104, M109 - fixed
Remove M104 only - freezes

So I guess it's to do with maintaining the hotend temperature? I note the M109 code hasn't been touched in 6 months, and I'm using the same config over both copies of code.
I was more patient and let one of the runs sit for a few minutes, and the print resumed. There were a few freezes over the print which caused melt holes,and the steppers sounded more buzzy or stuttery and not quite right. But the layer shift was not present.

I can also update my 2.1.2.1 code with your fixes to backlash.cp to avoid the M109 issue and do some more testing, are you looking to make any other changes?

Regarding BACKLASH_SMOOTHING_MM == 0, I think a catch at runtime is a good idea, because for most people it's easier to change variables in gcode than recompile, and here for example where I provide a binary it's nice to give people options without needing a different firmware for each one.

schmttc / EasyThreeD-K7-STM32

Setting M425 to non-zero values causes Layer Shift #2