prusa3d / Prusa-Firmware

Firmware for Original Prusa i3 3D printer by PrusaResearch
GNU General Public License v3.0
2.01k stars 1.05k forks source link

[BUG] Prusa MK2.5S firmware 3.10.0 (with MMU2S 1.0.6) randomly repeats a short section of gcode several times before failing with "static memory overwritten" error #3151

Closed IntegersOfK closed 7 months ago

IntegersOfK commented 3 years ago

Printer type - MK2.5S with MMU2S Printer firmware version- 3.10.0-4481

MMU Upgrade - MMU2S MMU upgrade firmware version - 1.0.6-372

SD card or USB/Octoprint SD Card

Describe the bug Printer has an odd movement, then repeats a short section of the gcode and eventually fails entirely with "static memory overwritten" error. If you restart the printer and do the print again, with the same print file, the failure may happen in a different section/different time.

After getting this error multiple times (in different sections of the same print), for my troubleshooting I reset the printer and MMU to factory defaults and reflashed the latest firmware on both. I did the calibration wizard all fresh and am using PrusaSlicer to create the gcode. This is a single colour print (using PrusaSlicer default profile "Single").

I have seen the problem occurs on both of these files: Fabric_of_Thyme_2.0_10x10_0.35mm_PLA_MK2.5SMMU2S_3h37m.zip

I have replaced the SD card as a last resort in case there was some kind of read error or corruption issue with that, using the same gcode files exported from PrusaSlicer. It does not appear to be related to the SD Card.

It's worth noting that after I see the issue happen, I can pause the print from the menu and when it resumes it appears to continue until the issue happens again eventually in another section or on the next print of the same file (but again, not always in the same place).

Here is a picture which shows the section of gcode has repeated itself several times before shutting down entirely: IMG_20210519_215034

Video https://youtu.be/z2-jaG_GTXU

The problem was also happening on the 3.10.0 RC1, but I had never had it happen on on earlier firmware versions (3.9.3) though I did not try these specific gcodes on that firmware version (I will reflash older firmware and try it now though).

2964 is another issue which appears to mention this type of error, though that issue report it happening at the start of the print.

2791 appears to be somewhat related to solving a similar issue, but is related to sensors?

I think some changelog referenced "strange movements" before this error, which feels close to this problem too.

ghost commented 2 years ago

It remains that we can't print reliably from the SD Card.

Yup, I'm giving up on this issue since I don't have the time or energy to spend entire days with this again so I'm also switching over to only using octoprint instead.

leptun commented 2 years ago

If possible, can you try using the latest MK3 branch? There have been a lot of changes recently, especially to memory usage. I'm curious if maybe there is some stack overflow going on that is uncaught and causes this weird behaviour. Here is the build output: FW3101-Build4697-MK3-b654217a5b3d77816eaeeb5ea803cc41de5e9b74.zip Be careful with this build tho. The changes are still quite fresh and there might be new bugs introduced. Use this firmware with caution. Even though it says fw 3.10.1, it's fw 3.12. I'm also reopening this issue since obviously some people are still experiencing this problem even in recent firmware.

wavexx commented 2 years ago

@IntegersOfK do you have the gcode? And a rough estimate at when this happens?

leptun commented 2 years ago

We've identified that the repeated gcode moves happen when the SD card disconnects for whatever reason while the card detect pin in the slot still says there's a card present. This is how it looks when done in the emulator (https://user-images.githubusercontent.com/17808203/158072571-a9425d35-41a9-4789-88cd-336cd2a3ed12.mp4). I'm not sure for how long this issue has been present, but at the moment an SD read failure does not pause/abort the current print and it just keeps printing the last 512B of gcode that are cached in RAM. The solution would be to properly detect the error and do something when the error occurs, but it still doesn't exactly explain why the slots stopped working after the SD reading algorithm was updated. It is possible that the increased speed made an already existing issue with the sd card socket more apparent. As for crashing because of this, I don't see how that could happen. Yes, in the video you can see a crash, but that is when a print is started, not during a print (it's because of how the file checking is implemented). Using another SD card probably won't help for this issue since most likely the SD card slot is the one that got damaged over time.

@gudnimg At the moment, the cmdqueue gets a '-1' character from the SD read function in case an error occured, but I don't see that scenario handled anywhere (besides immediately enqueuing the command as if it was complete). Was this code removed accidentally at some point or was it never implemented properly?

gudnimg commented 2 years ago

At the moment, the cmdqueue gets a '-1' character from the SD read function in case an error occured, but I don't see that scenario handled anywhere

I don't see any handling either.

In this call we seem to be allowing n == -1 to pass freely. https://github.com/prusa3d/Prusa-Firmware/blob/56cb8cbc63a7cf144614fdbef3328f163c5984b7/Firmware/cmdqueue.cpp#L569

leptun commented 2 years ago

@gudnimg We can add some handling there, but we must make sure nothing else breaks. Simply put, we must not rely only on the SD card detect pin for SD card presence. In case of errors, we must go to some other state (sd not initialized, but also not removed). Aborting the print is not ideal. We should probably pause and not allow to resume until the SD reinitializes. There should probably be a manual reinit menu item in the main menu.

Btw, if you want to replicate this, change SDCARDDETECT to 2 on the EinsyRambo. That will make the SD card detect pin read the UVLO pin, which is high under normal circumstances, so the SD is "always present", but not really. If you remove the SD card you get into that broken state. Beware that gcode commands that are still queued are most likely corrupted, so be careful.

wavexx commented 2 years ago

Do we even handle SD removal by checking SDCARDDETECT?

This only seems to be handled in the presort() function, probably because that's the only spot where a short read can happen while the card in being inserted.

And some more for the main LCD menu.

This is one case where we could trigger stop_and_save_print_to_ram() to trigger a hard pause, wait for the SD, then resume by restore_print_from_ram_and_continue. For a short loss of contact though this is not ideal either... it would actually make sense to check again for SD pin and retry a couple of times, even if that stalls the motion.

However, it would be awesome if @IntegersOfK could help us checking if that's actually what's happening for his printer. We could start with a log entry via serial.

wavexx commented 2 years ago

Thinking out loud, adding some serial logs for unexpected EOF/bad reads is something that would help tremendously irregardless.

IntegersOfK commented 2 years ago

Sure well if you're onto something I'm happy to try providing the serial output of a test firmware if you've added extra logging.

The issue was always pretty intermittent so I'm happy this theory tries to explain that. At the same time, I'll cross my fingers that I won't have to print several kilograms worth of test models before capturing the issue again!