prusa3d / Prusa-Firmware

Firmware for Original Prusa i3 3D printer by PrusaResearch
GNU General Public License v3.0
2.02k stars 1.05k forks source link

[BUG]Flashing 3.9.3 to MK3S fails #2954

Closed hlipka closed 1 year ago

hlipka commented 3 years ago

I have a MK3S (bought in September, no MMU), currently at 3.9.2. Today I tried to flash the new 3.9.3 firmware, but it fails at the end of upload. For the avrdude log see below. When this happens, the mainboard seems to be stuck in a boot loop - I looks as if the LCD is refreshed several times in a row with thew boot message, and then there is a short pause (like a reset / reboot), then it starts again. I also tried 3.9.3RC1, which also fails. Instead of the constant LCD refresh I just see an empty LCD. Flashing was done via first via Octoprint (using the Firmware updater). When this failed, I used PrusaSlicer connected directly to go back to 3.9.2 (which worked), the used PrusaSlicer again to try 3.9.3 and 3.9.3RC1. (I did the update to 3.9.2 successfully before so FW updates as such work). So it seems I'm stuck with 3.9.2 (falling back to it worked fine both times) and cannot upgrade to 3.9.3. Any other things I can try?

avrdude output (from PrusaSlicer):

avrdude-slic3r -v -p atmega2560 -c wiring -P /dev/ttyACM0 -b 115200 -D -U flash:w:0:/home/hli/Downloads/prusa3d_fw_MK3S_MK3S+_3_9_3_3556.hex:i

avrdude-slic3r: Version 6.3-20160220-prusa3d, compiled on Dec 22 2020 at 16:48:28
                Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
                Copyright (c) 2007-2014 Joerg Wunsch

                Using Port                    : /dev/ttyACM0
                Using Programmer              : wiring
                Overriding Baud Rate          : 115200
                AVR Part                      : ATmega2560
                Chip Erase delay              : 9000 us
                PAGEL                         : PD7
                BS2                           : PA0
                RESET disposition             : dedicated
                RETRY pulse                   : SCK
                serial program mode           : yes
                parallel program mode         : yes
                Timeout                       : 200
                StabDelay                     : 100
                CmdexeDelay                   : 25
                SyncLoops                     : 32
                ByteDelay                     : 0
                PollIndex                     : 3
                PollValue                     : 0x53
                Memory Detail                 :

                                         Block Poll               Page                       Polled
                  Memory Type Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack
                  ----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
                  eeprom        65    10     8    0 no       4096    8      0  9000  9000 0x00 0x00
                  flash         65    10   256    0 yes    262144  256   1024  4500  4500 0x00 0x00
                  lfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  hfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  efuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  lock           0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  calibration    0     0     0    0 no          1    0      0     0     0 0x00 0x00
                  signature      0     0     0    0 no          3    0      0     0     0 0x00 0x00

                Programmer Type : Wiring
                Description     : Wiring
                Programmer Model: AVRISP
                Hardware Version: 15
                Firmware Version Master : 2.10
                Vtarget         : 0.0 V
                SCK period      : 0.1 us

avrdude-slic3r: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.01s

avrdude-slic3r: Device signature = 0x1e9801 (probably m2560)
avrdude-slic3r: reading input file "/home/hli/Downloads/prusa3d_fw_MK3S_MK3S+_3_9_3_3556.hex"
avrdude-slic3r: writing flash (250818 bytes):

Writing | ################################################## | 100% 48.58s

avrdude-slic3r: 250818 bytes of flash written
avrdude-slic3r: verifying flash memory against /home/hli/Downloads/prusa3d_fw_MK3S_MK3S+_3_9_3_3556.hex:
avrdude-slic3r: load data flash data from input file /home/hli/Downloads/prusa3d_fw_MK3S_MK3S+_3_9_3_3556.hex:
avrdude-slic3r: input file /home/hli/Downloads/prusa3d_fw_MK3S_MK3S+_3_9_3_3556.hex contains 250818 bytes
avrdude-slic3r: reading on-chip flash data:

Reading | ################################################## | 100% 32.12s

avrdude-slic3r: verifying ...
avrdude-slic3r: 250818 bytes of flash verified

avrdude-slic3r done.  Thank you.

avrdude-slic3r -v -p atmega2560 -c arduino -P /dev/ttyACM0 -b 115200 -D -u -U flash:w:1:/home/hli/Downloads/prusa3d_fw_MK3S_MK3S+_3_9_3_3556.hex:i

avrdude-slic3r: Version 6.3-20160220-prusa3d, compiled on Dec 22 2020 at 16:48:28
                Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
                Copyright (c) 2007-2014 Joerg Wunsch

                Using Port                    : /dev/ttyACM0
                Using Programmer              : arduino
                Overriding Baud Rate          : 115200
avrdude-slic3r: prusa_init_external_flash(): MK3 printer did not boot up on time or serial communication failed
avrdude-slic3r: arduino_open(): Failed to initialize MK3 external flash programming mode
avrdude-slic3r: Could not open port: /dev/ttyACM0

avrdude-slic3r done.  Thank you.

Here also the update from the avrdude on my OctoPrint (which resulted in the boot loop)

2021-01-08 18:19:26,605
2021-01-08 18:19:26,611 /usr/bin/avrdude -v -q -p m2560 -c wiring -P /dev/ttyACM0 -D -U flash:w:/tmp/tmp5uaDyn:i
2021-01-08 18:19:27,824 >
2021-01-08 18:19:27,825 > avrdude: Version 6.3-20171130
2021-01-08 18:19:27,826 >          Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
2021-01-08 18:19:27,827 >          Copyright (c) 2007-2014 Joerg Wunsch
2021-01-08 18:19:27,827 >
2021-01-08 18:19:27,828 >          System wide configuration file is "/etc/avrdude.conf"
2021-01-08 18:19:27,829 >          User configuration file is "/home/pi/.avrduderc"
2021-01-08 18:19:27,829 >          User configuration file does not exist or is not a regular file, skipping
2021-01-08 18:19:27,830 >
2021-01-08 18:19:27,831 >          Using Port                    : /dev/ttyACM0
2021-01-08 18:19:27,831 >          Using Programmer              : wiring
2021-01-08 18:19:27,832 >          AVR Part                      : ATmega2560
2021-01-08 18:19:27,833 >          Chip Erase delay              : 9000 us
2021-01-08 18:19:27,833 >          PAGEL                         : PD7
2021-01-08 18:19:27,834 >          BS2                           : PA0
2021-01-08 18:19:27,835 >          RESET disposition             : dedicated
2021-01-08 18:19:27,835 >          RETRY pulse                   : SCK
2021-01-08 18:19:27,836 >          serial program mode           : yes
2021-01-08 18:19:27,837 >          parallel program mode         : yes
2021-01-08 18:19:27,837 >          Timeout                       : 200
2021-01-08 18:19:27,838 >          StabDelay                     : 100
2021-01-08 18:19:27,838 >          CmdexeDelay                   : 25
2021-01-08 18:19:27,839 >          SyncLoops                     : 32
2021-01-08 18:19:27,840 >          ByteDelay                     : 0
2021-01-08 18:19:27,840 >          PollIndex                     : 3
2021-01-08 18:19:27,841 >          PollValue                     : 0x53
2021-01-08 18:19:27,858 >          Memory Detail                 :
2021-01-08 18:19:27,859 >
2021-01-08 18:19:27,860 >                                   Block Poll               Page                       Polled
2021-01-08 18:19:27,861 >            Memory Type Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack
2021-01-08 18:19:27,861 >            ----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
2021-01-08 18:19:27,863 >            eeprom        65    10     8    0 no       4096    8      0  9000  9000 0x00 0x00
2021-01-08 18:19:27,864 >            flash         65    10   256    0 yes    262144  256   1024  4500  4500 0x00 0x00
2021-01-08 18:19:27,864 >            lfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
2021-01-08 18:19:27,865 >            hfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
2021-01-08 18:19:27,866 >            efuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
2021-01-08 18:19:27,866 >            lock           0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
2021-01-08 18:19:27,867 >            calibration    0     0     0    0 no          1    0      0     0     0 0x00 0x00
2021-01-08 18:19:27,867 >            signature      0     0     0    0 no          3    0      0     0     0 0x00 0x00
2021-01-08 18:19:27,868 >
2021-01-08 18:19:27,869 >          Programmer Type : Wiring
2021-01-08 18:19:27,869 >          Description     : Wiring
2021-01-08 18:19:27,870 >          Programmer Model: AVRISP
2021-01-08 18:19:27,871 >          Hardware Version: 15
2021-01-08 18:19:27,871 >          Firmware Version Master : 2.10
2021-01-08 18:19:27,872 >          Vtarget         : 0.0 V
2021-01-08 18:19:27,873 >          SCK period      : 0.1 us
2021-01-08 18:19:27,873 >
2021-01-08 18:19:27,874 > avrdude: AVR device initialized and ready to accept instructions
2021-01-08 18:19:27,874 > avrdude: Device signature = 0x1e9801 (probably m2560)
2021-01-08 18:19:27,875 > avrdude: reading input file "/tmp/tmp5uaDyn"
2021-01-08 18:19:27,876 > avrdude: writing flash (250818 bytes):
2021-01-08 18:19:27,876 >
2021-01-08 18:20:12,022 > avrdude: 250818 bytes of flash written
2021-01-08 18:20:12,024 > avrdude: verifying flash memory against /tmp/tmp5uaDyn:
2021-01-08 18:20:12,024 > avrdude: load data flash data from input file /tmp/tmp5uaDyn:
2021-01-08 18:20:12,025 > avrdude: input file /tmp/tmp5uaDyn contains 250818 bytes
2021-01-08 18:20:12,026 > avrdude: reading on-chip flash data:
2021-01-08 18:20:12,027 >
2021-01-08 18:20:44,372 > avrdude: verifying ...
2021-01-08 18:20:44,378 > avrdude: 250818 bytes of flash verified
2021-01-08 18:20:44,379 >
2021-01-08 18:20:44,380 > avrdude done.  Thank you.
2021-01-08 18:20:44,382 >
2021-01-08 18:20:44,383 >
2021-01-08 18:20:44,387 Flashing successful.
hlipka commented 3 years ago

One additional thing I noticed (just trying it again): PricaSlicer shows my printer (when starting the FW update) as 'Original Prusa i3 MK3 (/dev/ttyACM1)', not as a MK3S. And the 'Support' screen on the MK3S does the same (saying '1_75mm_MK3' with 'EINSy_10a').

Further update: on trying again, flashing itself succeeded this time, but the MK3S is still in a reboot loop (shows the boot message for several seconds, then seems to update the display many times in a row, with the display flickering, and the showing the boot message again).

The avrdude log this time:

avrdude-slic3r -v -p atmega2560 -c wiring -P /dev/ttyACM1 -b 115200 -D -U flash:w:0:/home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex:i

avrdude-slic3r: Version 6.3-20160220-prusa3d, compiled on Dec 22 2020 at 16:48:28
                Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
                Copyright (c) 2007-2014 Joerg Wunsch

                Using Port                    : /dev/ttyACM1
                Using Programmer              : wiring
                Overriding Baud Rate          : 115200
                AVR Part                      : ATmega2560
                Chip Erase delay              : 9000 us
                PAGEL                         : PD7
                BS2                           : PA0
                RESET disposition             : dedicated
                RETRY pulse                   : SCK
                serial program mode           : yes
                parallel program mode         : yes
                Timeout                       : 200
                StabDelay                     : 100
                CmdexeDelay                   : 25
                SyncLoops                     : 32
                ByteDelay                     : 0
                PollIndex                     : 3
                PollValue                     : 0x53
                Memory Detail                 :

                                         Block Poll               Page                       Polled
                  Memory Type Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack
                  ----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
                  eeprom        65    10     8    0 no       4096    8      0  9000  9000 0x00 0x00
                  flash         65    10   256    0 yes    262144  256   1024  4500  4500 0x00 0x00
                  lfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  hfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  efuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  lock           0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  calibration    0     0     0    0 no          1    0      0     0     0 0x00 0x00
                  signature      0     0     0    0 no          3    0      0     0     0 0x00 0x00

                Programmer Type : Wiring
                Description     : Wiring
                Programmer Model: AVRISP
                Hardware Version: 15
                Firmware Version Master : 2.10
                Vtarget         : 0.0 V
                SCK period      : 0.1 us

avrdude-slic3r: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.01s

avrdude-slic3r: Device signature = 0x1e9801 (probably m2560)
avrdude-slic3r: reading input file "/home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex"
avrdude-slic3r: writing flash (250818 bytes):

Writing | ################################################## | 100% 48.59s

avrdude-slic3r: 250818 bytes of flash written
avrdude-slic3r: verifying flash memory against /home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex:
avrdude-slic3r: load data flash data from input file /home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex:
avrdude-slic3r: input file /home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex contains 250818 bytes
avrdude-slic3r: reading on-chip flash data:

Reading | ################################################## | 100% 32.12s

avrdude-slic3r: verifying ...
avrdude-slic3r: 250818 bytes of flash verified

avrdude-slic3r done.  Thank you.

avrdude-slic3r -v -p atmega2560 -c arduino -P /dev/ttyACM1 -b 115200 -D -u -U flash:w:1:/home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex:i

avrdude-slic3r: Version 6.3-20160220-prusa3d, compiled on Dec 22 2020 at 16:48:28
                Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/
                Copyright (c) 2007-2014 Joerg Wunsch

                Using Port                    : /dev/ttyACM1
                Using Programmer              : arduino
                Overriding Baud Rate          : 115200
                AVR Part                      : ATmega2560
                Chip Erase delay              : 9000 us
                PAGEL                         : PD7
                BS2                           : PA0
                RESET disposition             : dedicated
                RETRY pulse                   : SCK
                serial program mode           : yes
                parallel program mode         : yes
                Timeout                       : 200
                StabDelay                     : 100
                CmdexeDelay                   : 25
                SyncLoops                     : 32
                ByteDelay                     : 0
                PollIndex                     : 3
                PollValue                     : 0x53
                Memory Detail                 :

                                         Block Poll               Page                       Polled
                  Memory Type Mode Delay Size  Indx Paged  Size   Size #Pages MinW  MaxW   ReadBack
                  ----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
                  eeprom        65    10     8    0 no       4096    8      0  9000  9000 0x00 0x00
                  flash         65    10   256    0 yes    262144  256   1024  4500  4500 0x00 0x00
                  lfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  hfuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  efuse          0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  lock           0     0     0    0 no          1    0      0  9000  9000 0x00 0x00
                  calibration    0     0     0    0 no          1    0      0     0     0 0x00 0x00
                  signature      0     0     0    0 no          3    0      0     0     0 0x00 0x00

                Programmer Type : Arduino
                Description     : Arduino
                Hardware Version: 3
                Firmware Version: 6.2
                Vtarget         : 0.3 V
                Varef           : 0.3 V
                Oscillator      : 28.800 kHz
                SCK period      : 3.3 us

avrdude-slic3r: AVR device initialized and ready to accept instructions

Reading | ################################################## | 100% 0.00s

avrdude-slic3r: Device signature = 0x1e9801 (probably m2560)
avrdude-slic3r: reading input file "/home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex"
avrdude-slic3r: writing flash (65132 bytes):

Writing | ################################################## | 100% 8.60s

avrdude-slic3r: 65132 bytes of flash written
avrdude-slic3r: verifying flash memory against /home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex:
avrdude-slic3r: load data flash data from input file /home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex:
avrdude-slic3r: input file /home/hli/Downloads/prusa3d_fw_MK3S_3_9_3_3556.hex contains 65132 bytes
avrdude-slic3r: reading on-chip flash data:

Reading | ################################################## | 100% 7.50s

avrdude-slic3r: verifying ...
avrdude-slic3r: 65132 bytes of flash verified

avrdude-slic3r done.  Thank you.
hlipka commented 3 years ago

Oh. I should have looked at the printer display while writing the previous comment. After 2 or 3 minutes of constantly rebooting, it finally finished (and is on FW 3.9.3). I did not try to print yet. But after turning the printer off, and after 10 seconds on again, it shows the same behavior of hanging in a reboot loop :( It takes about 2 minutes or so to finally boot up properly. A normal reset shows similar behavior (an sometimes even longer than 2 minutes - it seems to be much worse with an SD card inserted).

hlipka commented 3 years ago

Some more observations (maybe they are caused by the update problem, maybe not):

So I assumed that the first update try (the one via OctoPrint) did mess with the EEPROM setting for the configuration. I did a factory reset, which took a while. Pressing on rotary knob after a reset was not recognized immediately, the Einsy board kept rebooting instead. I just held the knob, and after several minutes the menu finally appeared.

So I'm finally going back to 3.9.2, and currently re-doing the calibration (due to deleting all data). I hope that helps in finding out where my problems are coming from...

chrizzo84 commented 3 years ago

Hello! I have about the same problem, a slightly different constellation:

At the first attempt I also got the error "MK3 printer did not boot up...".

Then I flashed again, that worked, but then the printer behaved very strange.

image

image

Then the support asked me again to flash the new version again => Done + Complete factory reset ==> Again the same errors!

Tested with prusa3d_fw_MK3S_MK3S+_3_9_3_3556.hex and also with prusa3d_fw_MK3S_3_9_3_3556.hex

numo68 commented 3 years ago

Same here, blank display and "Failed to initialize MK3 external flash programming mode" with 3.9.3 and MK3S. Going back to 3.9.2 worked.

mathias93 commented 3 years ago

Same here on 3.9.3 jumbled text on initial calibration and is not always happening at the same spot in the config; rolling back to 3.9.2

DRracer commented 3 years ago

OK, 3 issues here, we haven't seen any of these during a rather thorough testing:

  1. Long/repeated restarts - can you try another PC? This sounds like a misconfigured modem manager or similar process influencing the flashing and booting procedure.

  2. Cannot flash FW, or better to say - cannot flash translations into the external FLASH. This issue has been recurring for at least a year, mostly on Windows 10. We suspect some kind of third party software is messing with the COM port. Please note - we haven't been able to reproduce these issues reasonably consistently so far, so if you can help "demystifying" the upload problems, you are welcome.

  3. Garbled screen - that can be either caused by some errors in communication with the LCD (cable) or the FW was not flashed correctly.

Generally - the one and only change from 3.9.2 is the new XYZ algorithm, so these reports suggesting downgrading to 3.9.2 are highly suspicious - the root cause may be somewhere else.

Tahx commented 3 years ago

I was able to upgrade to 3.9.3 on an MK3S+ without any issue through avrdude and octoprint, maybe it can help pinpoint the issue on specific models.

hlipka commented 3 years ago

regarding the restarts: I cannot try another PC (don't have one), but as described I have seen the issue when flashing from Octoprint (using a RPi3) and via my PC (running Ubuntu 20.10). The restarts also takes very long when the PC is not connected to either of the two. The problem with flashing the translations is not with the flash process itself, but (in my case) because the Einsy board takes too long to restart after flashing the firmware (and then avrdude just times out). Note that when flashing from octoprint, it did not try to flash the translations, yet the problem happened. Regarding the LCD problems: FW 3.9.2 work without any problems, so I would rule out a problem with the LCD or the cabling (except when talking to the LCD board was changed it now more prone to any issues in that regard). But since I could use the printer OK (apart from the other issues I found) when it finally booted up, it seems that there is some other issue. Flashing and verifying the firmware itself was fine (see the logs above), so its not an issue with flashing, but with the FW itself. (I used two different FW files, but both are identical in their contents, so I assume that they are correct).

hlipka commented 3 years ago

One additional change done in 3.9.3 is the (additional) support for the SuperPINDA probe (see https://github.com/prusa3d/Prusa-Firmware/compare/v3.9.2...v3.9.3 ). When my current print finishes, I could try updating after disabling the temperature compensation setting (which I have on currently). Or maybe you could build a FW version without SuperPINDA support and I can try that one.

hlipka commented 3 years ago

Are there any debug messages written on a serial port which I could look at during one of the boot loops?

chrizzo84 commented 3 years ago

In my case I flashed with MacOs and also with Windows, also tried my MacBook (it was clean) - but nothing got better. Same like described before... strange signs and stuff...

numo68 commented 3 years ago
  1. Cannot flash FW, or better to say - cannot flash translations into the external FLASH. This issue has been recurring for at least a year, mostly on Windows 10. We suspect some kind of third party software is messing with the COM port. Please note - we haven't been able to reproduce these issues reasonably consistently so far, so if you can help "demystifying" the upload problems, you are welcome.

In my case Windows 10 or Linux made absolutely no difference and I tried several times with both and even with the avrdude trying the failed command. I admit that I am running both in a VmWare virtual machine but there was no such problem until now. Going back to 3.9.2 worked the first time I tried, as did all the previous updates on both MK2S some time ago and MK3S now. I actually tried 3.9.1 as that was what I had previously, then 3.9.2. Both went fine.

If the external flash needs a working firmware flashed in the first step then I am 99% sure that the problem lies there and that it is not the random failure I was also able to google up in the last months or years. I have no idea what the SuperPINDA is but if it is not present in printers bought half a year ago I strongly suggest checking the firmware with a printer not having it.

hlipka commented 3 years ago

The SuperPINDA is the new bed leveling sensor introduced with the MK3S+ (the MK3S has a PINDA V2, the older ones a V1). My MK3S has a PINDA V2 (AFAIK, it was bought in September last year).

chrizzo84 commented 3 years ago

What´s a little bit strange to me is that this initial timout failure only appeared one time... after that everything was flashing fine. But yes... the strange signs :( Today I reflashed again with factory reset before.... but didn´t help anything, same like before.

DRracer commented 3 years ago

All your reports are serious and I'm not saying you are not telling the truth - just thinking loud where could the problems be, especially when you say, that 3.9.2 is ok.

Especially the long boot time is interesting - does it behave so weird if:

hlipka commented 3 years ago

Yes, this happened with a 'bare' Einsy - no USB cable, no SD card, no RPi zero.

chrizzo84 commented 3 years ago

I had everything plugged in - USB and card. Is there anything how we could help you? Should I record a video?

hlipka commented 3 years ago

So I did a test with temperature calibration turned off. (Due to the factory reset I did on the weekend there seems to be no calibration data stored). Flashing succeeded on the first run, but afterwards the display showed strange data (the ones I described after doing the factory reset under 3.9.3 as above). After a reset its stuck in a loop. See the attached video.

https://user-images.githubusercontent.com/68884/104381792-530a3800-552d-11eb-93b8-9fb4c970a393.mp4

I then did a temperature calibration in 3.9.2 (worked fine, and enabled it). Flashed 3.9.3 again. Flashing worked, but afterwards the LCD just stayed empty. After a reset it was back in the reboot loop. Waited until it was finished, and the started temperature calibration on 3.9.3. Interestingly the messages on the LDC were mostly nonsense (and only parts were from the original messages). Started it anyway (knew how it worked from 3.9.2). It ended at the 6th point with the complaint that it could not find the calibration point ("bed calibration point not found"). After a reset, it was back into its reboot loop :( (and its showing the same strange data for the sheet names and the nozzle size as described above). Btw: I'm using English as locale.

It did not try again to see whether it was more happy, I just went back to 3.9.2 for now.

leptun commented 3 years ago

If you know how to connect to the printer via serial, can you please send a log with the startup messages? Maybe we can pinpoint where it gets stuck and boot loops

hlipka commented 3 years ago

@leptun Which serial connection would that be? Do I need to connect to the header pins (where the RPi Zero would be connected), or the regular USB connection? (Is the latter available during the whole boot process, or does it need to be activated by the Einsy board during startup?)

chrizzo84 commented 3 years ago

@leptun That´s the output of pronterface connected:

Connecting... start Printer is now online. echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init Command not found!

On the display there is only text "Original Prusa i3" (nothing more) and the printer got stuck.

### And that´s what I get after using reset-button:

echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:03 | Author: (none, default config) Compiled: Jan 7 2021 echo: Free Memory: 1746 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init Extruder fan type: NOCTUA CrashDetect ENABLED! FSensor ENABLED (sensor board revision: 0.4 or newer) Sending 0xFF echo:SD card ok

hlipka commented 3 years ago

I see the same sequence. Using GTKTerm I can also provide timestamps - the time between the 'adc_init' message and 'start' differs.

[18641.20h.42m.28s.146] start
[18641.20h.42m.37s.264] echo: 3.9.3-3556
[18641.20h.42m.37s.264] echo: Last Updated: Jan  7 2021 12:57:03 | Author: (none, default config)
[18641.20h.42m.37s.268] Compiled: Jan  7 2021
[18641.20h.42m.37s.272] echo: Free Memory: 1746  PlannerBufferBytes: 1760
[18641.20h.42m.37s.276] echo:Stored settings retrieved
[18641.20h.42m.37s.280] adc_init
[18641.20h.42m.37s.862] start
[18641.20h.42m.41s.970] echo: 3.9.3-3556
[18641.20h.42m.41s.970] echo: Last Updated: Jan  7 2021 12:57:03 | Author: (none, default config)
[18641.20h.42m.41s.979] Compiled: Jan  7 2021
[18641.20h.42m.41s.983] echo: Free Memory: 1746  PlannerBufferBytes: 1760
[18641.20h.42m.41s.987] echo:Stored settings retrieved
[18641.20h.42m.41s.987] adc_init
[18641.20h.42m.42s.875] start
[18641.20h.42m.46s.984] echo: 3.9.3-3556
[18641.20h.42m.46s.988] echo: Last Updated: Jan  7 2021 12:57:03 | Author: (none, default config)
[18641.20h.42m.46s.992] Compiled: Jan  7 2021
[18641.20h.42m.46s.998] echo: Free Memory: 1746  PlannerBufferBytes: 1760
[18641.20h.42m.47s.001] echo:Stored settings retrieved
[18641.20h.42m.47s.005] adc_init
[18641.20h.42m.48s.409] start
[18641.20h.42m.52s.518] echo: 3.9.3-3556
[18641.20h.42m.52s.518] echo: Last Updated: Jan  7 2021 12:57:03 | Author: (none, default config)
[18641.20h.42m.52s.526] Compiled: Jan  7 2021
[18641.20h.42m.52s.526] echo: Free Memory: 1746  PlannerBufferBytes: 1760
[18641.20h.42m.52s.534] echo:Stored settings retrieved
[18641.20h.42m.52s.534] adc_init
[18641.20h.42m.53s.738] start
[18641.20h.42m.57s.847] echo: 3.9.3-3556
[18641.20h.42m.57s.847] echo: Last Updated: Jan  7 2021 12:57:03 | Author: (none, default config)
[18641.20h.42m.57s.855] Compiled: Jan  7 2021
[18641.20h.42m.57s.855] echo: Free Memory: 1746  PlannerBufferBytes: 1760
[18641.20h.42m.57s.860] echo:Stored settings retrieved
[18641.20h.42m.57s.863] adc_init
[18641.20h.43m.00s.694] start
[18641.20h.43m.10s.226] echo: 3.9.3-3556
[18641.20h.43m.10s.226] echo: Last Updated: Jan  7 2021 12:57:03 | Author: (none, default config)
[18641.20h.43m.10s.234] Compiled: Jan  7 2021
[18641.20h.43m.10s.234] echo: Free Memory: 1746  PlannerBufferBytes: 1760
[18641.20h.43m.10s.239] echo:Stored settings retrieved
[18641.20h.43m.10s.242] adc_init
[18641.20h.43m.11s.843] start
[18641.20h.43m.15s.952] echo: 3.9.3-3556
[18641.20h.43m.15s.952] echo: Last Updated: Jan  7 2021 12:57:03 | Author: (none, default config)
[18641.20h.43m.15s.960] Compiled: Jan  7 2021
[18641.20h.43m.15s.960] echo: Free Memory: 1746  PlannerBufferBytes: 1760
[18641.20h.43m.15s.968] echo:Stored settings retrieved
[18641.20h.43m.15s.968] adc_init
[18641.20h.43m.16s.349] Extruder fan type: NOCTUA
[18641.20h.43m.17s.353] CrashDetect ENABLED!
[18641.20h.43m.18s.451] FSensor ENABLED (sensor board revision: 0.4 or newer)
[18641.20h.43m.20s.740] echo:SD init fail
chrizzo84 commented 3 years ago

@PrusaDevs: Today I tried some stuff with a support guy of you (Dominik).

hlipka commented 3 years ago

What I already learned from trying to understand the firmware (between the 'adc_init' and the 'fan type' outputs), when the printer type gets changed the EEPROM should be filled with all default settings. (btw: when I did the factory reset as described above, I still had the correct Z-distance for my second steel sheet stored. So I'm not sure whether this worked as it should). In the build script I can see that the MK3 version is build by default, not the MK3S version. I tried to build from the official sources too (the MK3S version), but it did not work because the build environment did pick up my local avr-gcc (from my Arduino 1.8.12, which is an avr-gcc 7.3.0). This raises the question whether maybe something like this happened for the official build (the build env provides a avr-gcc 4.9.2, but when another one wins over the provided one...)? (Edit: got it working by moving my ~/.arduino15 folder away. Compiled for MK3S - and the HEX file got was slightly larger than the official one. Will test it tomorrow.)

chrizzo84 commented 3 years ago

Today I had time to setup the real build => with copying the Configuration_prusa.h stuff, setting language stuff to 0 and so on. Build was done on Linux with ./build.sh. Hmmm, what should I say... for me it´s not better (assumed that I did everything right and everything should correspond to how you build the firmware)

hlipka commented 3 years ago

Same here - found out that I did copy the config file to the wrong location, so I still had a MK3 version (which got me the prompt at startup). But the MK3 version seemed to work fine. When I build the MK3S version, it has the correct size, but different content than the original firmware, and it still hangs in the boot loop (and has all the problem already outlined here). Let my try adding some more debug code. maybe I can see where it really goes into the reboot.

hlipka commented 3 years ago

OK, now its getting interesting. I took the 3.9.3 code (git tag 3.9.3) which was not working, and just added some additional debug output to the serial port (some _SERIAL_ECHORPGM(n("debug1")); in _Marlinmain.cpp, and some _SERIAL_ECHOLN("in tpinit 1"); in temperature.cpp). And lo and behold - now it works! (No boot loop, and also not the strange display issues described like the sheet names). The only explanation I have for that is that due to the new messages some code / data is in a different place now, which does not trigger the bug I have seen. Whether this is due to a compiler bug, or something in the code, I have no idea. Here the firmware I ended up with: firmware_393_debug_working.zip For now I'm fine with what I have, I will now start re-calibration (since the MK3 FW version probably has messed with the settings) and see what happens.

chrizzo84 commented 3 years ago

@hlipka Just flashed my printer with your version! Can approve, it´s working! So that´s really strange.

chrizzo84 commented 3 years ago

Okay, now getting more interesting - there are only left some little problems in the menu. In my case for example menu entry does not say "Fail stats" it´s "F´il stats" => If I´m scrolling over the item, it´s "Fail stats" but only a little part of a second.

Tested now: XYZ Calibration => Working Printing => Working Checked Menu items => does not say "Fail stats" it´s "F´il stats"

So for the moment it´s looking good....

ricardocab32 commented 3 years ago

I have a brand new Mk3s+ received on the 30 of December 2020. It was a Kit. Has firmware 3.9.2. When I tried to flash to 3.9.3 I get the display with all the squares. Tried everything I could find including tech support. Ended up going back to 3.9.2, works flawless. no issues.

chrizzo84 commented 3 years ago

In my case it´s also a MK3S+ (Kit), received on 23rd of Dec. (Think I forgot to mention the model in the other posts)

DRracer commented 3 years ago

We are looking into this issue, still we haven't been able to reproduce it except for one occasion (on a very non-standard printer). The comment from @hlipka saying "after adding some debug output, the FW works" points to some timing issues.

chrizzo84 commented 3 years ago

@DRracer is there any possibility of Remote-Debugging or sth? If yes, you can try your stuff on my printer.

DRracer commented 3 years ago

We have done some serious retest on our testing farm and production farm as well. So far we have been completely unsuccessful in replication of this issue, including upgrades from 3.9.2 to 3.9.3 with and without factory reset.

@chrizzo84 the printer itself doesn't support remote debugging, however if you had the possibility of connecting your machine to e.g. RPi2 or similar and enabling one of our devs to ssh into it, that may help.

We are also thinking about exchanging some of the problematic machines as they are for brand new ones (completed). The only condition must be, that the machine to be exchanged causes this issue every time - i.e. the reproducibility is high.

Anybody experiencing this issue located in CZ directly? You can write to my email on github if you do not wish to reveal your location publicly. I can see the author of this issue is from DE, which is also "close".

chrizzo84 commented 3 years ago

@DRracer What exactly do you need on the RPI? Have a RPI here and can also setup possibility that you guys can have access to the PI over SSH. Actually there is running Octopi on it... but it´s no problem for me to setup sth. new. If it would be interesting for you, feel free to contact me!!

We are also thinking about exchanging some of the problematic machines as they are for brand new ones (completed). The only condition must be, that the machine to be exchanged causes this issue every time - i.e. the reproducibility is high.

=> The complete machine? oO My one does produce the error every time with 3.9.3... but wouldn´t it be easier to just exchange the mainboard for testing purpose?

leptun commented 3 years ago

Not really. I just had a scenario where one board was having the issue, but after the contents of the flash and eeprom were moved to a different printer, all the issues were gone

DRracer commented 3 years ago

@chrizzo84 if you can manage the RPi to be accessible from the outside Internet, then it is a great possibility of trying stuff. Please write me an email so that we can sort the possible access into your MK3S+.

The complete machine: we still think it is something electrical, therefore shipping just the EINSY may no produce the issue again.

chrizzo84 commented 3 years ago

@leptun Okay, that really seems legit... well, I can offer myself for the exchange if you really want it => It´s a MK3S+ Kit => Bought on BlackFriday, delivered on 23rd of Dec.

chrizzo84 commented 3 years ago

@DRracer Please check your E-Mails! I just setup PI and SSH... also tested it, is working now....

DRracer commented 3 years ago

@chrizzo84 thank you, you should have an answer from me.

hlipka commented 3 years ago

@DRracer if you need a second machine for remote debugging, I think I can set up a tunnel to the outside world into the OctoPi I'm using for the MK3S. Regarding the replacement: I can reproduce it every time (though I did no re-test after flashing that 'debug output' version which works fine so far). Maybe its easier to first replace Einsy+LDC board first, as these might the main culprits if its really an electrical / electronic problem? (I have a MK3S kit, bought in September, shipped mid-October. Unfortunately I don't have any of the packaging material...)

DRracer commented 3 years ago

@hlipka thank you, so far we are looking at the machine of @chrizzo84 remotely. He is having the exactly the same problem.

#ifndef W25X20CL
    SERIAL_PROTOCOLLNPGM("start");
#else
    if ((optiboot_status != 0) || (selectedSerialPort != 0))
        SERIAL_PROTOCOLLNPGM("start");
#endif
///// 4s delay /////////////////
    SERIAL_ECHO_START;
    printf_P(PSTR(" " FW_VERSION_FULL "\n"));

Then obviously the WD kills the boot up process.

hlipka commented 3 years ago

So as a data point: I just re-flashed official 3.9.3, and I'm again in the boot loop. So having 3.9.3 running fine for a week does not cure the problem. To rule out power supply issues I disconnected everything from the PSU (apart from the Einsy board - just left one of the 24V rails on the board and disconnected the heatbed power input). Does not change anything.

3d-gussner commented 3 years ago

@hlipka Can you provide your modified source code, which is working?

3d-gussner commented 3 years ago

To rule out power supply issues I disconnected everything from the PSU (apart from the Einsy board - just left one of the 24V rails on the board and disconnected the heatbed power input). Does not change anything.

What else do you have connected to the PSU?

hlipka commented 3 years ago

@3d-gussner Usually I have the RPi3B+ connected (via a step-down-converter, so it should need about 0.5A), and some 24 LED stripes (which need about 1A). From my measurements the PSU has enough headroom to supply them even with the bed heating. For this test I disconnected everything, and just left the one connector on the EInsy board which powers the board itself (including the motors and the hot end, it seems). I disconnected the one connector supplying the bed. The power-loss-detection cable was also still connected. Regarding my changes - here the patch file I created from my changes:

diff --git a/Firmware/Marlin_main.cpp b/Firmware/Marlin_main.cpp
index f6509cdc..679f5537 100755
--- a/Firmware/Marlin_main.cpp
+++ b/Firmware/Marlin_main.cpp
@@ -1247,9 +1247,12 @@ void setup()
    else { //printer version was changed so use default settings
        Config_ResetDefault();
    }
+  SERIAL_ECHORPGM(_n("debug1"));
    SdFatUtil::set_stack_guard(); //writes magic number at the end of static variables to protect against overwriting static memory by stack
+  SERIAL_ECHORPGM(_n("debug2 "));
    tp_init();    // Initialize temperature loop
+  SERIAL_ECHORPGM(_n("debug3 "));
    if (w25x20cl_success) lcd_splash(); // we need to do this again, because tp_init() kills lcd
    else
diff --git a/Firmware/temperature.cpp b/Firmware/temperature.cpp
index d429bb88..abc84384 100755
--- a/Firmware/temperature.cpp
+++ b/Firmware/temperature.cpp
@@ -1060,12 +1060,14 @@ static void updateTemperaturesFromRawValues()
 void tp_init()
 {
+  SERIAL_ECHOLN("in tp_init 1");
 #if MB(RUMBA) && ((TEMP_SENSOR_0==-1)||(TEMP_SENSOR_1==-1)||(TEMP_SENSOR_2==-1)||(TEMP_SENSOR_BED==-1))
   //disable RUMBA JTAG in case the thermocouple extension is plugged on top of JTAG connector
   MCUCR=(1<<JTD);
   MCUCR=(1<<JTD);
 #endif
+  SERIAL_ECHOLN("in tp_init 2");
   // Finish init of mult extruder arrays
   for(int e = 0; e < EXTRUDERS; e++) {
     // populate with the first value
@@ -1080,6 +1082,7 @@ void tp_init()
 #endif //PIDTEMPBED
   }
+  SERIAL_ECHOLN("in tp_init 3");
   #if defined(HEATER_0_PIN) && (HEATER_0_PIN > -1)
     SET_OUTPUT(HEATER_0_PIN);
   #endif
@@ -1233,6 +1236,7 @@ void tp_init()
 #endif
   }
 #endif //AMBIENT_MAXTEMP
+  SERIAL_ECHOLN("in tp_init 3");
 }
 #if (defined (TEMP_RUNAWAY_BED_HYSTERESIS) && TEMP_RUNAWAY_BED_TIMEOUT > 0) || (defined (TEMP_RUNAWAY_EXTRUDER_HYSTERESIS) && TEMP_RUNAWAY_EXTRUDER_TIMEOUT > 0)

I did not do any further experiment which output is the one that makes the difference. Tell me when this would help you (or when you would like me to add something in other places - I have the build environment working so it should not take too long to test something.

DRracer commented 3 years ago

@chrizzo84 we are still playing with your machine - when we disable WDT at the beginning of setup(), your machine now boots. We'll investigate further.

hlipka commented 3 years ago

@DRracer Regarding the "We are also thinking about exchanging some of the problematic machines" - do you have an ETA for that decision? The reason I'm asking is that I wanted to use the weekend for my printers first real maintenance break (I think I'm through 6 rolls if filament as of now), and wanted to use that to upgrade the Y-axis to MK3S+ style and change the nozzle. And that does not make sense when I'm asked to send it back two days later :) (I can easily post-pone that maintenance, I just don't want to revert my changes...)

DRracer commented 3 years ago

@hlipka hard to say now, we have to investigate the ASM differences between Linux and Windows builds of the FW 3.9.3. That's our plan for the next hours.

So far we have narrowed the problem into something, which looks like a compiler issue. That's the only lead we have right now, any slight change to the source code makes the FW boot correctly, as you have verified with the serial output texts. We don't want to release a FW with an additional text on the serial line, we need to find the real cause of the problem to be able to avoid it in the future.

To be exact - tag v3.9.3 built on Linux and Windows differs somehow to the extent, that the Windows build runs fine on the machine of @chrizzo84 while the Linux build (same Arduino 1.8.5 and compiler version) ends up in a loop of restarts.

If we manage to isolate and fix the problem remotely on @chrizzo84 's machine, that would be the best case - because the worst case scenario is to collect a printer, start the analysis and find out, that the problem disapeared during transport (which has happened several times in the past in case of electrical problems).

@hlipka so please keep your maintenance planned, we'll post our progress again when we find something significant.