Closed hlipka closed 1 year ago
@DRracer I wuld not so much look for a compiler error, but more for timing differences between the two compiler versions. On my board the boot eventually suceeds (sometimes on the first try, but most times it takes like 5 to 10 iterations). When the WDT is the one doing the reboot, I would guess there is task during boot which does not handle the WDT correctly, and runs at the wrong time (and takes too long). As soon as this task then runs a little bit earlier or later the WDT will not be triggered - and adding serial output takes time, so some tasks will run later. (Thats why I proposed to add some GPIO toggles as markers, so you can see which parts of the code show this timing variations)
@hlipka I can confirm the code from Windows is at least 70% identical to the Linux (official build) version:
Therefore I agree with the theory of a timing issue. @chrizzo84 's machine never made it from the boot loop, at least we didn't see that ever happen yesterday.
Anyone experiencing this problem has an additional EINSY board available to swap into your problematic printer (both with FW 3.9.3 of course)? That would rule out or confirm the influence of the HW.
@DRracer
Anyone experiencing this problem has an additional EINSY board available to swap into your problematic printer (both with FW 3.9.3 of course)? That would rule out or confirm the influence of the HW.
Just a proposal: Actually there is an RMA open for my printer => have a warped printbed and will get a new one. As I can see it is not shipped yet, only announced to UPS (but seems like UPS didn´t get it). Maybe you can send me a test EINSY with that shipment. Think one way or another I need to send the warped Bed back... and then can also send back the EINSY...
@DRracer
@chrizzo84 's machine never made it from the boot loop, at least we didn't see that ever happen yesterday.
That´s correct => Only if I switched it off and on again - then it sometimes came up...
@DRracer I will have a go on the logic analyzer this evening. I started a fork at https://github.com/hlipka/Prusa-Firmware/tree/2954-pin-toggle - the pins I want to use are D18, D53 and D73 (all three are at J19). I would start with adding markers at the same places where I added the serial debug out, and then work my way either to earlier places in the code or dig deeper.
@hlipka @chrizzo84 Can you please contact me at info@prusa3d.com? Use my name in the e-mail, it will get assigned to me. I would like to collect your Einsy boards and further examine them, but first, we will send you new ones.
@JakoobCZ I just sent the Email!
@JakoobCZ Me too.
While adding these debug-pin-toggles I came to the point where building the 3.9.3 FW branch now works for me, without any changes (using the official build setup). Its size differs from the original FW, though. I have no idea whats happening here. I have attached the FW: firmware_hli_393_3556.zip So this ruins my plan of using the logic analyzer, since I cannot reproduce the problem with my own firmware :( But one additional thought: even when the WDT would explain the reboot loop, it does not explain why some parts of the menu behave strange afterwards (like the 'sheets' menu).
But one additional thought: even when the WDT would explain the reboot loop, it does not explain why some parts of the menu behave strange afterwards (like the 'sheets' menu).
Well, that´s also what was really strange for me - in your (hlipka) compiled Firmware there was only one little strange thing... "Fail stats" was "F´il stats" => When scrolling over it it changed to "Fail stats"... nothing more. Now with the Windows compiled version there are no strange signs... it´s really strange. Okay, I´m a Dev and do know really good that strange things could happen... but strange signs because of a timing? oO Strange that I´m using strange so often :D :D
OK, found my mistake: I was using the 3.9.3 branch - which is different from the 3.9.3 firmware (the branch has an additional 'merge branch MK3' commit so its more like a 3.9.4 at the moment) (and now I finally have a build with pin toggles in a reboot loop running...)
What I have seen so far with the LA:
@hlipka that's a very good report, thank you, we'll investigate further today.
@JakoobCZ @DRracer UPS just picked up Einsy board. Before I packed it up, I installed stock 3.9.3 on it, and checked that even with just the LCD connected and using a lab power supply the boot loop happens (which it does). I hope that helps in finding the root cause.
No news on this matter? Still have not been able to update my firmware.
I've been having issues flashing firmware versions all weekend, it's been recommended to me from the Prusa forums that I should bump this issue with my experience.
Last night I tried flashing the 3.9.3 firmware using the latest PrusaSlicer. When attempting the first flash, I was given an error like this: `avrdude-slic3r -v -p atmega2560 -c wiring -P COM4 -b 115200 -D -U flash:w:0:C:\Users\dstar\Downloads\prusa3d_fw_MK3S_3_9_3_3556.hex:i
avrdude-slic3r: Version 6.3-20160220-prusa3d, compiled on Jan 11 2021 at 14:16:13 Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/ Copyright (c) 2007-2014 Joerg Wunsch
Using Port : COM4
Using Programmer : wiring
Overriding Baud Rate : 115200
AVR Part : ATmega2560
Chip Erase delay : 9000 us
PAGEL : PD7
BS2 : PA0
RESET disposition : dedicated
RETRY pulse : SCK
serial program mode : yes
parallel program mode : yes
Timeout : 200
StabDelay : 100
CmdexeDelay : 25
SyncLoops : 32
ByteDelay : 0
PollIndex : 3
PollValue : 0x53
Memory Detail :
Block Poll Page Polled
Memory Type Mode Delay Size Indx Paged Size Size #Pages MinW MaxW ReadBack
----------- ---- ----- ----- ---- ------ ------ ---- ------ ----- ----- ---------
eeprom 65 10 8 0 no 4096 8 0 9000 9000 0x00 0x00
flash 65 10 256 0 yes 262144 256 1024 4500 4500 0x00 0x00
lfuse 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
hfuse 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
efuse 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
lock 0 0 0 0 no 1 0 0 9000 9000 0x00 0x00
calibration 0 0 0 0 no 1 0 0 0 0 0x00 0x00
signature 0 0 0 0 no 3 0 0 0 0 0x00 0x00
Programmer Type : Wiring
Description : Wiring
Programmer Model: AVRISP
Hardware Version: 15
Firmware Version Master : 2.10
Vtarget : 0.0 V
SCK period : 0.1 us
avrdude-slic3r: AVR device initialized and ready to accept instructions
Reading | ################################################## | 100% 0.00s
avrdude-slic3r: Device signature = 0x1e9801 (probably m2560) avrdude-slic3r: safemode: hfuse reads as D0 avrdude-slic3r: safemode: efuse reads as FD avrdude-slic3r: reading input file "C:\Users\dstar\Downloads\prusa3d_fw_MK3S_3_9_3_3556.hex" avrdude-slic3r: writing flash (250818 bytes):
Writing | ################################################## | 100% 48.57s
avrdude-slic3r: 250818 bytes of flash written avrdude-slic3r: verifying flash memory against C:\Users\dstar\Downloads\prusa3d_fw_MK3S_3_9_3_3556.hex: avrdude-slic3r: load data flash data from input file C:\Users\dstar\Downloads\prusa3d_fw_MK3S_3_9_3_3556.hex: avrdude-slic3r: input file C:\Users\dstar\Downloads\prusa3d_fw_MK3S_3_9_3_3556.hex contains 250818 bytes avrdude-slic3r: reading on-chip flash data:
Reading | ################################################## | 100% 32.12s
avrdude-slic3r: verifying ... avrdude-slic3r: 250818 bytes of flash verified
avrdude-slic3r: safemode: hfuse reads as D0 avrdude-slic3r: safemode: efuse reads as FD avrdude-slic3r: safemode: Fuses OK (E:FD, H:D0, L:FF)
avrdude-slic3r done. Thank you.
avrdude-slic3r -v -p atmega2560 -c arduino -P COM4 -b 115200 -D -u -U flash:w:1:C:\Users\dstar\Downloads\prusa3d_fw_MK3S_3_9_3_3556.hex:i
avrdude-slic3r: Version 6.3-20160220-prusa3d, compiled on Jan 11 2021 at 14:16:13 Copyright (c) 2000-2005 Brian Dean, http://www.bdmicro.com/ Copyright (c) 2007-2014 Joerg Wunsch
Using Port : COM4
Using Programmer : arduino
Overriding Baud Rate : 115200
avrdude-slic3r: prusa_init_external_flash(): MK3 printer did not boot up on time or serial communication failed avrdude-slic3r: arduino_open(): Failed to initialize MK3 external flash programming mode avrdude-slic3r: Could not open port: COM4
avrdude-slic3r done. Thank you.`
My printer then started bootlooping. After a couple minutes of bootlooping, it would finally boot. I tried flashing 3.9.3 again and this time was told it was successful, however it still bootlooped for a few minutes on every power on and crashed frequently while navigating menus.
I then tried a factory reset, despite the bootlooping making that pretty difficult. This then made the bootlooping more severe. I had to leave the printer on and bootlooping for nearly two hours before it finally fully booted again. I was greeted with the language select menu because of the factory reset I performed. Rather than selecting a language, I immediately tried flashing firmware 3.9.1 (last known working for me). Flashed without error, first-time-calibrated just fine.
Now that the printer was working properly again, I tried once more to flash 3.9.3, and the flash failed with the above error again and started the bootlooping over again. I flashed 3.9.1, PrusaSlicer once again told me it failed and the printer still bootlooped, however upon fully booting, the menu reads 3.9.1 and I haven't had any crashes. It's currently printing fine on 3.9.1, but likely still has the bootlooping problem. When this print finishes, I'm going to check for bootlooping and if it's present doing another factory reset.
Suffice it to say, I don't think any of this is working as intended.
There are no news on this issue yet besides we got the EINSY from @hlipka . So hopefully the board still does the issue. I'll be looking into is this week. Still, we haven't been able to reproduce the issue at Prusa HQ at all, even though we tried really hard. I'm really curious what the issue is.
On Friday the board was picked up by UPS, I think it should arrive today at Prusa HQ. I am very curious to know if you also still have the error, and if you can then reproduce the error.
@hlipka @chrizzo84 do your new boards work correctly even with FW 3.9.3?
@DRracer cannot say anything yet, hopefully the new one will arrive tomorrow.
@DRracer Since I was still doing my printer maintenance it took a while - I'm currently doing the calibration wizard. But so far its OK, no boot loops at least. I'm curious how the bed levelling / first layer looks like, since there were some reports around that.
So the new board runs fine so far - first run wizard did run through, first layer calibration was fine (even with a new Nozzle-X I had nearly the same Z-offset as before), first prints were also OK.
@DRracer got the board today. installed it, calibrated everything and could not find any errors so far. i will now make 1-2 test prints, but do not assume that there will still be something. One question: Should I replace the 3.9.3, which is currently on it? - So just reflash? Simply to test.
@hlipka good news - your board is still doing the boot loop on my table, investigating now why. @chrizzo84 great to hear that - there should be FW 3.9.3 on your board already (the same release HEX file as on our github page), but you can try flashing it again to see if it changes anything.
update: @hlipka and yes, the board start successfully after 10 consecutive reboots
Print time with new board about 12-15h => No errors or anything else so far. @DRracer Have you been able to test it with the boards we sent you?
Still having this problem myself. For good measure, I did a factory reset first, and then I flashed 3.9.1, flashed without fail. Then I flashed 3.9.2, flashed without fail. Then I flashed 3.9.3, started bootlooping, gave me the failure message above, and I had to flash 3.9.2 to get the bootlooping to stop.
I'm using the latest version of PrusaSlicer and drivers provided on the website, tried it on multiple machines plugged into the printer directly via USB. There's just nothing I can do to get 3.9.3 to flash successfully.
We are still waiting for the board from @chrizzo84 to arrive. We only have one board which we don't want to destroy (it is the only specimen we have so far).
Anyway - if you look at PR #3033 - that's all it took to make @hlipka 's board boot correctly. There must be some timing issue but we still don't know where.
When you look in the forums there are some more users with a similar problem. As for the reason - could it be that the EEPROM read / write speed is not deterministic? IIRC in the latest version there is additional storage for the SPINDA status, and maybe this brought some close-to-the-edge timing over the final edge. (looking at the data sheet, it seems reads are always 4 clock cycles, but writes might differ). But then it could also be miniscule differences in the actual clock speed (or the watchdog timer which has its own clock). Maybe its both :(
I'm writing something to keep this alive. Still on 3.9.2 unable to upgrade firmware on my new i3 Mk3s+
To anyone still experiencing this issue with FW 3.9.3 - could you please try flashing the fresh 3.10.0-RC1 and report back if the issue persists? The board from @hlipka started booting reliably and successfully after some fixes present in the 3.10 branch. Thank you.
I have flashed Firmware 3.10.0-RC1 successfully. I was able to run the wizard and calibrate the printer without any issues. No issues seen so far. Will run a couple of complicated prints to see if any issues arise. Thank you for your support.
I have flashed 3.10.0-RC1 successfully from Linux, thanks! Will try printing later and report in case of problems.
For some reason I got either timeouts or verification errors while verifying the first (250 k) part when flashing from Windows but that might be my setup, I am running from inside a virtual machine so it is not exactly a standard one. Interestingly it however did not make problems with the previous versions.
I, too, have successfully flashed 3.10.0-RC1 using Windows. I'll do some prints this weekend to verify everything functions as it should, but the problem I used to have is gone, anyway. Thanks much!
@hlipka
(some _SERIAL_ECHORPGM(n("debug1")); in _Marlinmain.cpp, and some _SERIAL_ECHOLN("in tpinit 1"); in temperature.cpp)
Would you please detail where you inserted the code in marlin_main.cpp and temperature.cpp? Thanks!
@kgolger see #2954 (comment)
Thanks! I will give this a try!
I fell into this very trap last night going to 3.9.3 using avrdude. It disconnected mid upload and could not detect the printer at the usb port. Putty didn't work either. Factory reset procedure also failed. Based on a Reddit article, I lugged my workstation down to the print enclosure in the basement and connected locally. Prusa Slic3r had a port rescan tool that found the printer and I was able to complete the upgrade. Was getting pretty annoyed for a minute as it seemed that I had made a brick. Octoprint firmware upgrader is rather important to me. It seems a shame if they disabled it.
@velocentric Octoprint firmware updater doesn't write to the external flash of the Einsy board.
At this moment only Prusa-Slicer does it correctly which updates 1st the atmega2560 internal flash and then updates the external flash. You can see the process in the advanced output that there are two flashing sequences.
MK2.5/S printer on the other side can be updated via Octoprint firmware updater as the miniRAMBo doesn't has an external flash to be updated.
Alternatives:
OK, now its getting interesting. I took the 3.9.3 code (git tag 3.9.3) which was not working, and just added some additional debug output to the serial port (some _SERIAL_ECHORPGM(n("debug1")); in _Marlinmain.cpp, and some _SERIAL_ECHOLN("in tpinit 1"); in temperature.cpp). And lo and behold - now it works! (No boot loop, and also not the strange display issues described like the sheet names). The only explanation I have for that is that due to the new messages some code / data is in a different place now, which does not trigger the bug I have seen. Whether this is due to a compiler bug, or something in the code, I have no idea. Here the firmware I ended up with: firmware_393_debug_working.zip For now I'm fine with what I have, I will now start re-calibration (since the MK3 FW version probably has messed with the settings) and see what happens.
I ran into the same issue reported here today and can confirm that the manually compiled version from @hlipka works. I have an MK3S. And it has the PINDA v2 with 3 cables instead of 4. Don't know if that matters.
This issue has been flagged as stale because it has been open for 60 days with no activity. The issue will be closed in 7 days unless someone removes the "stale" label or adds a comment.
This issue has been closed due to lack of recent activity.
I have a MK3S (bought in September, no MMU), currently at 3.9.2. Today I tried to flash the new 3.9.3 firmware, but it fails at the end of upload. For the avrdude log see below. When this happens, the mainboard seems to be stuck in a boot loop - I looks as if the LCD is refreshed several times in a row with thew boot message, and then there is a short pause (like a reset / reboot), then it starts again. I also tried 3.9.3RC1, which also fails. Instead of the constant LCD refresh I just see an empty LCD. Flashing was done via first via Octoprint (using the Firmware updater). When this failed, I used PrusaSlicer connected directly to go back to 3.9.2 (which worked), the used PrusaSlicer again to try 3.9.3 and 3.9.3RC1. (I did the update to 3.9.2 successfully before so FW updates as such work). So it seems I'm stuck with 3.9.2 (falling back to it worked fine both times) and cannot upgrade to 3.9.3. Any other things I can try?
avrdude output (from PrusaSlicer):
Here also the update from the avrdude on my OctoPrint (which resulted in the boot loop)