prusa3d / Prusa-Firmware

Firmware for Original Prusa i3 3D printer by PrusaResearch
GNU General Public License v3.0
2.02k stars 1.05k forks source link

[BUG]<Firmware upgrade impossible since 3.10.0> #3321

Closed LR-J closed 8 months ago

LR-J commented 2 years ago

Printer type - MK3 Printer firmware version - 3.10.0 and 3.10.1 (files prusa3d_fw_MK3_3_10_1_4697.hex and prusa3d_fw_MK3_3_10_0_4481.hex)

SD card or USB/Octoprint upgrade via USB

Describe the bug Since firmware 3.10.0, I can't do the update on my machine. The logs do not show any errors. After the upgrade, the screen stays stuck on the welcome message, as if the boot sequence never ended. Downgrade to 3.9.3 is ok.

To Reproduce upgrade via usb using slic3r 2.3.3 on Ubuntu 18.04.

Expected behavior A normal boot sequence after upgrade...

G-code no relevant.

3d-gussner commented 2 years ago

@LR-J Sorry to hear that. Can you open a terminal to the USB after you have flashed the firmware with PrusaSlicer and report back the logfile?

LR-J commented 2 years ago

I have a 14 hours printing on going. I'll do this tomorrow. It's a genuine Prusa MK3.

LR-J commented 2 years ago

Here is the log : MK3_firmware update log file.md

LR-J commented 2 years ago

Here is the behaviour I get for the boot sequence...

https://user-images.githubusercontent.com/15336633/146635075-6baf842a-69be-40bd-98d8-144a2b80015d.mp4

gudnimg commented 2 years ago

Could you try reading from the Serial port using PuTTy or similar tool? That will give us more info on how far the printer is booting.

The below settings work for me, the top image shows a successful boot on my end. image

image

The blue circled checkboxes need to be checked for the lines to appear right. image

LR-J commented 2 years ago

putty.md Here are the info I can get through the serial connection. Nothing really helpful for me.

leptun commented 2 years ago

Here is the log : MK3_firmware update log file.md

avrdude-slic3r: Device signature = ???????????
avrdude-slic3r: safemode: hfuse reads as D0
avrdude-slic3r: safemode: efuse reads as FD

That ???? line is suspicious. It should say avrdude-slic3r: Device signature = 0x1e9801 (probably m2560) Did you edit the log file? That isn't a serial number. It's the product ID of the atmega2560 microcontroller used in the electronics.

leptun commented 2 years ago

https://github.com/prusa3d/Prusa-Firmware/issues/3321#issuecomment-997205051 @DRracer Another board failing soon after adc_init... :/ We need to find out what timing issue happens around there during init.

LR-J commented 2 years ago

Here is the log : MK3_firmware update log file.md

avrdude-slic3r: Device signature = ???????????
avrdude-slic3r: safemode: hfuse reads as D0
avrdude-slic3r: safemode: efuse reads as FD

That ???? line is suspicious. It should say avrdude-slic3r: Device signature = 0x1e9801 (probably m2560) Did you edit the log file? That isn't a serial number. It's the product ID of the atmega2560 microcontroller used in the electronics.

the initial line of the file is indeed "avrdude-slic3r: Device signature = 0x1e9801 (probably m2560)". I changed the line thinking it was an ID

gudnimg commented 2 years ago

We now know the issue is somewhere in this code. Somewhere before the message "FSensor ENABLED" appears but after "CrashDetect ENABLED".

    st_init();    // Initialize stepper, this enables interrupts!

#ifdef TMC2130
    tmc2130_mode = silentMode?TMC2130_MODE_SILENT:TMC2130_MODE_NORMAL;
    update_mode_profile();
    tmc2130_init(TMCInitParams(false, FarmOrUserECool() ));
#endif //TMC2130
#ifdef PSU_Delta
     init_force_z();                              // ! important for correct Z-axis initialization
#endif // PSU_Delta

    setup_photpin();

    servo_init();

    // Reset the machine correction matrix.
    // It does not make sense to load the correction matrix until the machine is homed.
    world2machine_reset();

    // Initialize current_position accounting for software endstops to
    // avoid unexpected initial shifts on the first move
    clamp_to_software_endstops(current_position);
    plan_set_position_curposXYZE();

#ifdef FILAMENT_SENSOR
    fsensor_init();
#endif //FILAMENT_SENSOR

Perhaps we should add more messages to debug bootup issues like this via log (to narrow down the problem).

  1. st_init --> "st_init OK"
  2. tmc2130_init--> "tmc2130_init OK"
  3. servo_init--> "servo_init OK"
LR-J commented 2 years ago

Adding simple lines like printf_P(PSTR("st_init OK"); in Marlin_main.cpp is OK ?

I modified the file to that effect and compiled it. I have a job in progress (3 hours left). I'll try the modification tonight.

gudnimg commented 2 years ago

Adding simple lines like printf_P(PSTR("st_init OK"); in Marlin_main.cpp is OK ?

I modified the file to that effect and compiled it. I have a job in progress (3 hours left). I'll try the modification tonight.

Yup it is ok. When I've added simple debug message only for myself I would use MYSERIAL.println("message); MYSERIAL is basically the same as Serial.println in Arduino IDE.

leptun commented 2 years ago

We can add those extra print messages. The only problem I see with this is that sometimes adding a printf actually makes the timing issue go away. The bug is very sensitive to timing (it was thought to be gone, but apparently it isn't for everyone). Hopefully the bug is still there even with printf so we can narrow the issue down.

leptun commented 2 years ago

@LR-J Please open this issue again as I think you closed it accidentally

LR-J commented 2 years ago

The problem seems to be related to the fsensor_init function. I can get a message via MYSERIAL.println() before it but nothing after.

gudnimg commented 2 years ago

@LR-J that's interesting. Is there a specific line in fsensor_init() where the Serial message is lost?

LR-J commented 2 years ago

I'm searching...

LR-J commented 2 years ago

pat9125_init() seems to be problematic.

I'm exhausted. I will continue my research tomorrow. Good nights guys !

gudnimg commented 2 years ago

May not be related but there were a few changes to the pat9125 code in v3.10.0 https://github.com/prusa3d/Prusa-Firmware/pull/2814 Perhaps this is an I2C issue. 🤔

I can't help but wonder if your printer is stuck in this endless while loop: https://github.com/prusa3d/Prusa-Firmware/blob/0be90dc5d1d8b6b12e063b751c7e16486520ba98/Firmware/twi.c#L61

leptun commented 2 years ago

Yes, it's possible that the fault happens in the hardware i2c layer. Question is, why would that even happen? Could the i2c hardware be faulty in that specific mcu? Considering a printf didn't kill the issue, we could try implementing a timeout in the wait function.

LR-J commented 2 years ago

The problem do not appear if I comment those lines :

if (twi_r8(PAT9125_I2C_ADDR,addr,&data))
        goto error;

but FSENSOR is DISABLED

LR-J commented 2 years ago

May not be related but there were a few changes to the pat9125 code in v3.10.0 #2814 Perhaps this is an I2C issue. thinking

I can't help but wonder if your printer is stuck in this endless while loop:

https://github.com/prusa3d/Prusa-Firmware/blob/0be90dc5d1d8b6b12e063b751c7e16486520ba98/Firmware/twi.c#L61

You are right. My printer my printer gets stuck in this loop : while( !(TWCR & _BV(TWINT)));

leptun commented 2 years ago

TWI_wait_timeout.zip Please try uploading this hex file to the printer. It should print more debugging stuff in twi_wait and also have a 100ms timeout. In case of timeout, the twi configuration/status is also printed to serial.

LR-J commented 2 years ago

Here's the result :

`

start echo: 3.10.1-4697 echo: Last Updated: Dec 19 2021 12:10:53 | Author: (none, default config) Compiled: Dec 19 2021 echo: Free Memory: 2014 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init CrashDetect ENABLED! twi_wait(08)...timeout TWBR:0C TWSR:F8 TWAR:FE TWCR:24 TWAMR:00 PAT9125_init:0 FSensor DISABLED Sending 0xFF echo:SD card ok LCD status changed MMU not responding - DISABLED

`

leptun commented 2 years ago

Please send a second log, but this time with the filament sensor cable unplugged.

LR-J commented 2 years ago

start echo: 3.10.1-4697 echo: Last Updated: Dec 19 2021 12:10:53 | Author: (none, default config) Compiled: Dec 19 2021 echo: Free Memory: 2014 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init CrashDetect ENABLED! twi_wait(08)...ok twi_wait(18)...!status PAT9125_init:0 Fsensor DISABLED Sending 0xFF echo:SD card ok LCD status changed MMU not responding - DISABLED

leptun commented 2 years ago

Quick intepretation of the first log: The start condition doesn't seem to get transmitted and it times out. TWAMR looks good TWAR looks good, General call not enabled TWDR is not logged because it has side effects. And it's not important right now anyway. TWBR looks good. 400KHz baud rate TWCR look good I think. The twi interface is enabled and it has been instructed to do a start condition in master mode. TWSR has correct prescaler setting, no relevant data available in status bits.

Second log: The start condition is sent successfully. The sensor does not respond to the address call. This is expected since the sensor was removed.

Going back to the first log. The only reason for the start condition to not be sent is this:

The TWI hardware checks if the bus is available, and generates a START condition on the bus if it is free. However, if the bus is not free, the TWI waits until a STOP condition is detected, and then generates a new START condition to claim the bus Master status.

So somehow the twi hardware thinks that the bus is being used by some other master. It is possible that either the sensor or the board are faulty. I don't think there is any way to recover from such state. I can try disabling and enabling the twi interface on init() to see if that changes anything, but I don't think it would help

leptun commented 2 years ago

TWI_wait_timeout_extra_init.zip Please try this again with the sensor. I disable and enable the TWI hardware on init and also forcefully send a STOP condition which should get the i2c into a defined state

LR-J commented 2 years ago

I have recently ordered a MK3 --> MK3S+ upgrade kit. If the sensor is faulty, this should solve the problem. If board is faulty this will also affect the upgrade (given that twi_r8() is called on both case - IR and non IR sensor) ?

filament detection seems to work correctly with 3.9.3. Is it possible to have a fault on 3.10.0 and 3.10.1 (with the mechanism you describe) and a normal behavior on 3.9.3 ?

leptun commented 2 years ago

In 3.9.3 a simpler software i2c implementation was used. I don't think that one has some of the multi-master arbitration checks that the hardware uses. Hardware i2c should be prefered since it also provides signal filters, something not really possible to do easily in software.

I really find it strange that 3.9.3 worked. Did it really work? Maybe it didn't hang on startup, but did it actually communicate with the sensor correctly? After you try the extra init hex files, can you go back to 3.9.3 and post some logs of that startup sequence?

The newer sensor on the MK3S+ doesn't use the same pins anyway, so it will work. It also didn't hang when you removed the filament sensor since the START condition got sent successfully. So the new sensor will work regardless of what's causing the original issue.

LR-J commented 2 years ago

TWI_wait_timeout_extra_init.zip Please try this again with the sensor. I disable and enable the TWI hardware on init and also forcefully send a STOP condition which should get the i2c into a defined state

I have exactly the same log (word-for-word) :

start echo: 3.10.1-4697 echo: Last Updated: Dec 19 2021 12:10:53 | Author: (none, default config) Compiled: Dec 19 2021 echo: Free Memory: 2014 PlannerBufferBytes: 1760 echo:Stored settings retrieved adc_init CrashDetect ENABLED! twi_wait(08)...timeout TWBR:0C TWSR:F8 TWAR:FE TWCR:24 TWAMR:00 PAT9125_init:0 FSensor DISABLED Sending 0xFF echo:SD card ok LCD status changed MMU not responding - DISABLED

gudnimg commented 2 years ago

Whether or not this is solved after the MK3S+ upgrade. The software solution may help other people in the future. 💪

LR-J commented 2 years ago

Whether or not this is solved after the MK3S+ upgrade. The software solution may help other people in the future. muscle

You're right. That's why I continue to ask questions ;-)

leptun commented 2 years ago

@gudnimg I disagree. We don't know yet if it's a hardware/wiring issue until we get a 3.9.3 log. A hardware interface should always be prefered over a software one unless there is a hardware bug in production that affects basically everyone. I still strongly suspect there is something wrong with the sensor or the wiring. The issue should have still been there even with the sensor removed if the atmega2560 was damaged.

LR-J commented 2 years ago

The problem also exist with 3.9.3. The log shows that fsensor is also disabled.

leptun commented 2 years ago

Maybe it's disabled from the menu. Enable it back and press the reset button and see if it's still disabled. If it is, then the filament sensor hardware is faulty. Btw, where is the 3.9.3 log. I don't see it

LR-J commented 2 years ago

start echo: 3.9.3-3556 echo: Last Updated: Jan 7 2021 12:57:48 | Author: (none, default config)

Compiled: Jan 7 2021

echo: Free Memory: 1716 PlannerBufferBytes: 1760

echo:Stored settings retrieved

adc_init CrashDetect ENABLED! PAT9125_init:0 FSensor DISABLED

Sending 0xFF

LR-J commented 2 years ago

In the menu, I can see an error for the fsensor : Filament sensor is not responding please check connection.

It is true that I have not recently printed to the very end of a bobbin.

leptun commented 2 years ago

Phew. Ok. So it is a hardware issue. 🎉 Can you send pictures with how the cable is connected to the einsy rambo and to the sensor?

LR-J commented 2 years ago

20211219_150032 20211219_150654

leptun commented 2 years ago

The wiring looks to be correct. Does the start condition get sent if you disconnect the sensor at the extruder, not at the board? I'm wondering if it's the sensor that is damaged or the cable has a short to gnd.

LR-J commented 2 years ago

in the above trial, I have disconnected to sensor on the extruder side, not on the board side.

leptun commented 2 years ago

Ok. Then it's the sensor that is damaged.

LR-J commented 2 years ago

OK. thank you for your help.

user11111112 commented 2 years ago

I have the exact same issue with two MK3's.  Both were updated to the 3.10.1 firmware at the same time, and both now hang on the loading screen after reboot.  Unplugging the sensor enables both printers to finish booting immediately.   I tried installing a new sensor, but the issue persists.  Rolling back to 3.9 firmware fixes the issue, but why is this happening?  And how can it be fixed.  The same issue manifesting on two separate devices at the same time seems to preclude a hardware issue, particularly since a new sensor has no effect.  Can anyone help with this?

leptun commented 2 years ago

@user11111112 What sensor type do you have? Laser or IR (MK3 or MK3S)? Can you post also the hardware revision of the filament sensor and of the einsy board (written on the pcbs)? This way maybe we can reproduce the issue as well.

user11111112 commented 2 years ago

It's the laser sensor. It looks to be rev 0.2e, and the einsy board looks like 1.1e. Hopefully that's helpful (and accurate), and thanks in advance for the help!!

leptun commented 2 years ago

@user11111112 I have the exact filament sensor revision, but the einsy is a different revision (1.2b on all of the boards I have). Hopefully I can trigger this issue (no luck so far). In any case, even without the TWI timeout implemented (todo @wavexx), the fact that it gets stuck indicates that there is a deeper problem with the i2c bus that manifests (by mistake) in a lockup instead of an error message. Can you try running https://github.com/prusa3d/Prusa-Firmware/files/7740699/TWI_wait_timeout.zip and posting a log from Serial? It should let us know what is happening in the TWI interface code.

leptun commented 2 years ago

I've added the changes from that "TWI wait timeout" build to the MK3 branch, so this will be part of release 3.12.0

user11111112 commented 2 years ago

I've added the changes from that "TWI wait timeout" build to the MK3 branch, so this will be part of release 3.12.0

Fantastic work, thank you @leptun!!

Is #3412 the interface code you were looking for from me? If not, I can still flash that file and share the log.