prusa3d / Prusa-Firmware

Firmware for Original Prusa i3 3D printer by PrusaResearch
GNU General Public License v3.0
2.02k stars 1.05k forks source link

[BUG]MK3S+ 3.11.0, mesh bed leveling is not done on G80 #3541

Closed arekm closed 1 year ago

arekm commented 2 years ago

MK3S+ (bought as assembled product) Prusa-Firmware 3.11.0 based on Marlin Octoprint 1.8.1

Describe the bug

I'm usually printing by using "upload and print" prusaslicer functionality which feeds octoprint over a network and then a printer over USB.

Prusaslicer has startup g-code that contains

G28 W ; home all without mesh bed level
G80 ; mesh bed leveling

but I noticed that sometimes it starts printing without doing mesh bed leveling and then nozzle ends up too high breaking the print.

Asked on discord server and it seems that mesh bed leveling should never be skipped if the G80 exists in g-code.

On failed print the communication with mk3s+ looks like this (from serial.log in octoprint):

2022-07-11 08:08:10,981 - Send: N22 M109 S240*94
2022-07-11 08:08:11,010 - Recv: LCD status changed
2022-07-11 08:08:11,723 - Recv: T:240.8 /240.0 B:84.8 /85.0 T0:240.8 /240.0 @:22 B@:66 P:0.0 A:33.5
2022-07-11 08:08:12,014 - Recv: T:240.5 E:0 W:1
2022-07-11 08:08:13,013 - Recv: T:239.3 E:0 W:0
2022-07-11 08:08:13,730 - Recv: T:240.0 /240.0 B:84.9 /85.0 T0:240.0 /240.0 @:35 B@:56 P:0.0 A:33.3
2022-07-11 08:08:14,013 - Recv: LCD status changed
2022-07-11 08:08:14,015 - Recv: ok
2022-07-11 08:08:14,031 - Send: N23 G28 W*85
2022-07-11 08:08:14,440 - Recv: tmc2130_home_enter(axes_mask=0x01)
2022-07-11 08:08:15,140 - Recv:   0 step=62 mscnt= 994
2022-07-11 08:08:15,143 - Recv: tmc2130_goto_step 0 13 2 1000
2022-07-11 08:08:15,258 - Recv: tmc2130_home_exit tmc2130_sg_homing_axes_mask=0x01
2022-07-11 08:08:15,372 - Recv: tmc2130_home_enter(axes_mask=0x02)
2022-07-11 08:08:15,733 - Recv: T:239.8 /240.0 B:85.0 /85.0 T0:239.8 /240.0 @:40 B@:64 P:0.0 A:33.4
2022-07-11 08:08:16,057 - Recv: echo:busy: processing
2022-07-11 08:08:17,740 - Recv: T:239.8 /240.0 B:85.0 /85.0 T0:239.8 /240.0 @:39 B@:82 P:0.0 A:33.6
2022-07-11 08:08:18,059 - Recv: echo:busy: processing
2022-07-11 08:08:19,747 - Recv: T:239.8 /240.0 B:85.1 /85.0 T0:239.8 /240.0 @:40 B@:87 P:0.0 A:33.1
2022-07-11 08:08:20,059 - Recv: echo:busy: processing
2022-07-11 08:08:20,202 - Recv:   0 step= 2 mscnt=  37
2022-07-11 08:08:20,206 - Recv: tmc2130_goto_step 1 1 2 1000
2022-07-11 08:08:20,264 - Recv: tmc2130_home_exit tmc2130_sg_homing_axes_mask=0x02
2022-07-11 08:08:21,754 - Recv: T:239.5 /240.0 B:85.2 /85.0 T0:239.5 /240.0 @:45 B@:72 P:0.0 A:33.5
2022-07-11 08:08:22,058 - Recv: echo:busy: processing
2022-07-11 08:08:23,758 - Recv: T:239.9 /240.0 B:85.1 /85.0 T0:239.9 /240.0 @:37 B@:94 P:0.0 A:33.3
2022-07-11 08:08:24,057 - Recv: echo:busy: processing
2022-07-11 08:08:25,335 - Recv: ok
2022-07-11 08:08:25,350 - Send: N24 G80*39
2022-07-11 08:08:25,355 - Recv: ok
2022-07-11 08:08:25,366 - Send: N25 G1 Z0.2 F720*26
2022-07-11 08:08:25,393 - Recv: ok
2022-07-11 08:08:25,420 - Send: N26 G1 Y-3 F1000*28

notice that G80 is sent but it ends immediately, 0.005s after starting. No mesh leveling is done (confirmed visually, too).

On good prints, when mesh bed leveling happens, it takes longer to finish G80:

2022-07-11 05:51:22,152 - Recv: T:241.6 /240.0 B:85.1 /85.0 T0:241.6 /240.0 @:45 B@:87 P:0.0 A:33.0
2022-07-11 05:51:22,622 - Recv: LCD status changed
2022-07-11 05:51:22,623 - Recv: ok
2022-07-11 05:51:22,633 - Send: N22 G28 W*84
2022-07-11 05:51:22,667 - Recv: tmc2130_home_enter(axes_mask=0x01)
2022-07-11 05:51:23,360 - Recv:   0 step=62 mscnt= 993
2022-07-11 05:51:23,363 - Recv: tmc2130_goto_step 0 13 2 1000
2022-07-11 05:51:23,478 - Recv: tmc2130_home_exit tmc2130_sg_homing_axes_mask=0x01
2022-07-11 05:51:23,592 - Recv: tmc2130_home_enter(axes_mask=0x02)
2022-07-11 05:51:24,154 - Recv: T:242.9 /240.0 B:85.2 /85.0 T0:242.9 /240.0 @:20 B@:66 P:0.0 A:33.3
2022-07-11 05:51:24,630 - Recv: echo:busy: processing
2022-07-11 05:51:26,161 - Recv: T:242.4 /240.0 B:85.2 /85.0 T0:242.4 /240.0 @:29 B@:81 P:0.0 A:33.1
2022-07-11 05:51:26,628 - Recv: echo:busy: processing
2022-07-11 05:51:28,170 - Recv: T:242.7 /240.0 B:85.4 /85.0 T0:242.7 /240.0 @:23 B@:55 P:0.0 A:33.0
2022-07-11 05:51:28,426 - Recv:   0 step= 1 mscnt=  30
2022-07-11 05:51:28,430 - Recv: tmc2130_goto_step 1 1 2 1000
2022-07-11 05:51:28,488 - Recv: tmc2130_home_exit tmc2130_sg_homing_axes_mask=0x02
2022-07-11 05:51:28,634 - Recv: echo:busy: processing
2022-07-11 05:51:30,175 - Recv: T:241.7 /240.0 B:85.5 /85.0 T0:241.7 /240.0 @:39 B@:40 P:0.0 A:33.4
2022-07-11 05:51:30,630 - Recv: echo:busy: processing
2022-07-11 05:51:32,178 - Recv: T:242.5 /240.0 B:85.6 /85.0 T0:242.5 /240.0 @:22 B@:30 P:0.0 A:33.0
2022-07-11 05:51:32,629 - Recv: echo:busy: processing
2022-07-11 05:51:33,735 - Recv: ok
2022-07-11 05:51:33,746 - Send: N23 G80*32
2022-07-11 05:51:34,186 - Recv: T:241.4 /240.0 B:85.7 /85.0 T0:241.4 /240.0 @:39 B@:38 P:0.0 A:33.2
2022-07-11 05:51:35,750 - Recv: echo:busy: processing
2022-07-11 05:51:36,193 - Recv: T:241.3 /240.0 B:85.6 /85.0 T0:241.3 /240.0 @:37 B@:58 P:0.0 A:33.2
2022-07-11 05:51:37,754 - Recv: echo:busy: processing
2022-07-11 05:51:38,196 - Recv: T:240.7 /240.0 B:85.8 /85.0 T0:240.7 /240.0 @:43 B@:26 P:0.0 A:33.0
2022-07-11 05:51:39,752 - Recv: echo:busy: processing
2022-07-11 05:51:40,203 - Recv: T:241.1 /240.0 B:85.9 /85.0 T0:241.1 /240.0 @:33 B@:8 P:0.0 A:33.1
2022-07-11 05:51:41,752 - Recv: echo:busy: processing
2022-07-11 05:51:42,210 - Recv: T:240.3 /240.0 B:85.9 /85.0 T0:240.3 /240.0 @:46 B@:27 P:0.0 A:33.1
2022-07-11 05:51:43,754 - Recv: echo:busy: processing
2022-07-11 05:51:44,214 - Recv: T:241.3 /240.0 B:86.0 /85.0 T0:241.3 /240.0 @:29 B@:14 P:0.0 A:33.1
2022-07-11 05:51:45,753 - Recv: echo:busy: processing
2022-07-11 05:51:46,231 - Recv: T:241.8 /240.0 B:85.9 /85.0 T0:241.8 /240.0 @:20 B@:31 P:0.0 A:33.2
2022-07-11 05:51:47,753 - Recv: echo:busy: processing
2022-07-11 05:51:48,227 - Recv: T:241.9 /240.0 B:86.0 /85.0 T0:241.9 /240.0 @:19 B@:23 P:0.0 A:33.0
2022-07-11 05:51:49,084 - Recv: LCD status changed
2022-07-11 05:51:49,089 - Recv: ok
2022-07-11 05:51:49,128 - Send: N24 G1 Z0.2 F720*27

It happens once per few prints.

G80 shouldn't be skipped/ignored I think.

Edit: tried "G80 V30", too but seems that SUPPORT_VERBOSITY is not compiled in in firmware that the printer came with.

2022-07-11 09:42:09,192 - Recv: LCD status changed
2022-07-11 09:42:09,193 - Recv: ok
2022-07-11 09:42:09,208 - Send: N25 G28 W*83
2022-07-11 09:42:09,602 - Recv: tmc2130_home_enter(axes_mask=0x01)
2022-07-11 09:42:10,302 - Recv:   0 step=62 mscnt= 993
2022-07-11 09:42:10,304 - Recv: tmc2130_goto_step 0 13 2 1000
2022-07-11 09:42:10,421 - Recv: tmc2130_home_exit tmc2130_sg_homing_axes_mask=0x01
2022-07-11 09:42:10,536 - Recv: tmc2130_home_enter(axes_mask=0x02)
2022-07-11 09:42:10,659 - Recv: T:242.4 /240.0 B:85.1 /85.0 T0:242.4 /240.0 @:5 B@:59 P:0.0 A:32.6
2022-07-11 09:42:11,220 - Recv: echo:busy: processing
2022-07-11 09:42:12,665 - Recv: T:241.5 /240.0 B:85.1 /85.0 T0:241.5 /240.0 @:28 B@:59 P:0.0 A:32.6
2022-07-11 09:42:13,219 - Recv: echo:busy: processing
2022-07-11 09:42:14,668 - Recv: T:241.1 /240.0 B:84.9 /85.0 T0:241.1 /240.0 @:37 B@:101 P:0.0 A:32.4
2022-07-11 09:42:15,222 - Recv: echo:busy: processing
2022-07-11 09:42:15,366 - Recv:   0 step= 2 mscnt=  34
2022-07-11 09:42:15,370 - Recv: tmc2130_goto_step 1 1 2 1000
2022-07-11 09:42:15,426 - Recv: tmc2130_home_exit tmc2130_sg_homing_axes_mask=0x02
2022-07-11 09:42:16,676 - Recv: T:241.3 /240.0 B:84.7 /85.0 T0:241.3 /240.0 @:38 B@:126 P:0.0 A:32.5
2022-07-11 09:42:17,220 - Recv: echo:busy: processing
2022-07-11 09:42:18,684 - Recv: T:242.3 /240.0 B:84.9 /85.0 T0:242.3 /240.0 @:21 B@:79 P:0.0 A:32.7
2022-07-11 09:42:19,220 - Recv: echo:busy: processing
2022-07-11 09:42:20,473 - Recv: ok
2022-07-11 09:42:20,486 - Send: N26 G80 V30*80
2022-07-11 09:42:20,493 - Recv: ok
2022-07-11 09:42:20,501 - Send: N27 G1 Z0.2 F720*24
2022-07-11 09:42:20,545 - Recv: ok
wavexx commented 2 years ago

This is extremely strange. Indeed G80 should never be skipped. Does it happen also on the same gcode if you try multiple times?

arekm commented 2 years ago

I can trigger it by just doing G80 by hand:

2022-07-11 10:20:14,301 - Recv: T:156.8 /0.0 B:71.5 /0.0 T0:156.8 /0.0 @:0 B@:0 P:0.0 A:31.7
2022-07-11 10:20:16,306 - Recv: T:156.0 /0.0 B:71.6 /0.0 T0:156.0 /0.0 @:0 B@:0 P:0.0 A:31.7
2022-07-11 10:20:18,312 - Recv: T:154.9 /0.0 B:71.3 /0.0 T0:154.9 /0.0 @:0 B@:0 P:0.0 A:31.5
2022-07-11 10:20:18,871 - Send: G80 V30
2022-07-11 10:20:18,890 - Recv: ok

so failed

2022-07-11 10:20:20,316 - Recv: T:153.7 /0.0 B:70.9 /0.0 T0:153.7 /0.0 @:0 B@:0 P:0.0 A:31.7
2022-07-11 10:20:22,322 - Recv: T:152.8 /0.0 B:70.8 /0.0 T0:152.8 /0.0 @:0 B@:0 P:0.0 A:31.9
2022-07-11 10:20:24,333 - Recv: T:152.0 /0.0 B:70.4 /0.0 T0:152.0 /0.0 @:0 B@:0 P:0.0 A:31.7
2022-07-11 10:20:26,329 - Recv: T:150.9 /0.0 B:70.3 /0.0 T0:150.9 /0.0 @:0 B@:0 P:0.0 A:31.8
2022-07-11 10:20:26,397 - Send: G80 V30
2022-07-11 10:20:28,335 - Recv: T:149.8 /0.0 B:70.2 /0.0 T0:149.8 /0.0 @:0 B@:0 P:0.0 A:31.8
2022-07-11 10:20:28,401 - Recv: echo:busy: processing
2022-07-11 10:20:30,338 - Recv: T:148.9 /0.0 B:70.0 /0.0 T0:148.9 /0.0 @:0 B@:0 P:0.0 A:31.9
2022-07-11 10:20:30,404 - Recv: echo:busy: processing
2022-07-11 10:20:32,346 - Recv: T:148.1 /0.0 B:69.7 /0.0 T0:148.1 /0.0 @:0 B@:0 P:0.0 A:31.9
2022-07-11 10:20:32,404 - Recv: echo:busy: processing

succeess, tried again - success, again - success, 5 more again - 5x success, again - failure (G80 returns immediately), again - success

    if (waiting_inside_plan_buffer_line_print_aborted)
        return;

could that kick in sometimes?

wavexx commented 2 years ago

Does "G28 W" home reliably 100% of the time? Can you see if the status of the pinda light is always the same after G28?

arekm commented 2 years ago

I didn't notice any problems with "G28 W".

What I was doing in new testing was: 1) reset the printer via physical button 2) wait until octoprint reconnects 3) send G80 (which first did echo:Enqueing to the front: "G28 W0") and then actual G80 4) send G80 again (which didn't do "G28 W0" mostly ... but it did it one more time in my loop, no idea why. I would expect it to never force G28 in such scenario again but that's only my guessing) 5) goto 4)

In 27 tries it failed twice (at 2022-07-11 12:35:33,635 and 2022-07-11 12:39:37,027). Pinda light was always off before and after G80 finished (regardless if G80 succeeded or failed). It was flashing when G80 did its probing.

The same log with and without temperatures (for readability): with-temp.txt.gz without-temp.txt.gz

I've also built and flashed firmware from MK3 git branch, variant for my printer, and so far wasn't able to reproduce the problem there (3.11.1-4987) but that testing was only 20-30x G80. After that I've downgraded to official 3.11.0-4955... and I was able to reproduce G80 once again!

So what I'll do now is to flash git version again and use that for few days. (building with LANG_MODE=1 even that README.md says to set it to 0 but official seems to be built with=1 (?), SUPPORT_VERBOSITY and few additional printks in G80 function).

wavexx commented 2 years ago

I've filed a bug regarding the README (#2736), it's sadly outdated).

The release build is always with LANG_MODE=1 indeed. We also have the MK3_3.11.1 which is going to be the next release soon. It would be nice to know if you can reproduce the issue there given it happens relatively often

arekm commented 2 years ago

50x G80 on MK3_3.11.1 - own build; works ok 50x G80 on prusa3d_fw_MK3S_MK3S+_3_11_1-RC1_4987.hex from github release assets - works - ok 50x G80 on MK3_3_11.0 - own build; works ok (!!) back to prusa3d_fw_MK3S_MK3S+_3_11_0_4955.hex from https://www.prusa3d.com/downloads/firmware/prusa3d_fw_3_11_0_MK3S.zip - failure at second G80 try

The only thing that comes to mind is that prusa3d_fw_3_11_0_MK3S.zip hex inside is miscompiled due to some gcc/binutils etc bug. But I would expect people to complain and I see no such problems reported when googling.

Anyway could you try that official hex on mk3s+ like booting and just 20x G80, too?

ps. My builds were just cp Firmware/variants/1_75mm_MK3S-EINSy10a-E3Dv6full.h Firmware/Configuration_prusa.h; ./build.sh on Linux (+ patch adding few "printf_P(PSTR("..."))" around places where gcode_G80() function returns. Stopped even using SUPPORT_VERBOSITY because it doesn't fit memory.

wavexx commented 2 years ago

Which version of gcc are you using for your own builds? gcc 5.4 has been known to miscompile some functions and 5.11.1 is the first release that switches to gcc 7 from a newer arduino release.

wavexx commented 2 years ago

Could you also send me your own MK3_3.11.0 build that works ok? I would like to compare the generated assembly. I would need the elf file

arekm commented 2 years ago

build.sh is fetching build environment on each branch - so I'm using these (PF-build-env-1.0.6.1 for 3.11.0). But my builds work fine.

The problem is only with official precompiled hex at https://www.prusa3d.com/downloads/firmware/prusa3d_fw_3_11_0_MK3S.zip.

Would be nice if someone else tested it with like 20x G80. (I assume most of devs don't use official hex and instead test their own builds).

My build tree with all files: http://ixion.pld-linux.org/~arekm/Prusa-Firmware-3.11.0/

wavexx commented 2 years ago

The firmware is built into ../Prusa-Firmware-build though

gudnimg commented 2 years ago

I just tried running G80 20 times on my MK3S+ using build https://www.prusa3d.com/downloads/firmware/prusa3d_fw_3_11_0_MK3S.zip It ran successfully each time.

I'm not using Octoprint though, I used a serial window to insert the G80 (Serial Monitor in Arduino IDE)

arekm commented 2 years ago

Ok, don't know the build system and I flashed http://ixion.pld-linux.org/~arekm/Prusa-Firmware-3.11.0/lang/firmware.hex ... which flashed fine and booted.

Updated with fresh build: http://ixion.pld-linux.org/~arekm/Prusa-Firmware-3.11.0/ http://ixion.pld-linux.org/~arekm/Prusa-Firmware-build/ which I'm going to flash (Prusa-Firmware-build/Firmware.ino.hex) and test again.

arekm commented 2 years ago

40x G80 and no problem with above Prusa-Firmware-build/Firmware.ino.hex.

So I'm unable to produce own faulty hex :/

How official hex was build, are there any logs of what was used etc?

Build with my printfs patch removed: http://ixion.pld-linux.org/~arekm/prusa-no-printf-patch/

wavexx commented 2 years ago

To avoid loosing track, it seems like the issue persist also when disabling filament sensor, fan checks, removing the sd card and switching to stealth mode.

What we did was to patch two instructions at 0x39bd8+4 on the 3.11.0 official build with nops to exclude the potential early return in gcode_G80, without success (still exhibits the same behavior of skipping G80 from time to time).

So it looks like the command might not be executed at all, potentially not entering the command queue entirely :thinking:

wavexx commented 2 years ago

I wonder if there is any other 3 letter gcode we could run repeatedly in the same way to see if this is really related to G80, or it's just unrelated.

arekm commented 2 years ago

With that patched firmware, filtered out M155 and M105 in octoprint (so these never reach printer thus no automatic temperature reporting and no manual temperature reading), just G80 communication happening over serial:

2022-07-11 21:14:49,323 - Info: Not sending M105 to printer, it's configured as a blocked command 2022-07-11 21:14:51,566 - Send: G80 2022-07-11 21:14:51,590 - Recv: ok 2022-07-11 21:14:54,330 - Info: Not sending M105 to printer, it's configured as a blocked command

a failure which took 68x G80 to trigger. So temperature reporting is unrelated (or at least not needed to repeat failure).

"and switching to stealth mode." - it switches automatically? I didn't switch that "on" manually.

Edit: but switched mode to stealth now, reset via button and it failed after 3x G80.

wavexx commented 2 years ago

@arekm Can you try if this still applies to the MK3_3.12 branch? We removed a lot of issues regarding locking especially in the temperature regulation isr.

Knowing if this still happens/applies to 3.12 would be much easier.

arekm commented 2 years ago

120x G80 and it didn't break (which doesn't mean it is fixed as previously my own build worked fine but there is a chance...)

Test was done using MK3_3.12 branch, commit 6bee4fec8dadf515c1ceccb8165704ae679de5fe .

wavexx commented 2 years ago

Absolutely agree this doesn't prove it got fixed. At least we didn't make it worse ;). An official 3.12 beta should be released soon.

arekm commented 1 year ago

Lets assume it was miscompilation problem and get this issue closed (until it happens again for someone else; with newer firmwares of course).