prusa3d / Prusa-Firmware

Firmware for Original Prusa i3 3D printer by PrusaResearch
GNU General Public License v3.0
2.02k stars 1.06k forks source link

Printer pauses randomly at start of print requiring user resume (3.8.0-RC) #2105

Closed vintagepc closed 1 year ago

vintagepc commented 5 years ago

I've been experimenting with this gcode for a bit and my printer always pauses during the purge line or just before it for some reason. The screen looks normal, but it will do absolutely nothing until I press the knob on the printer. It's happened twice during the purge, and once before the MMU started the filament load. I've had this issue with 3.7.2 as well but could not get a handle on it at the time.

I finally had it connected to pronterface today which confirmed it as it reported "echo:busy: paused for user" in the console.

I've attached the offending gcode. Please note it's an MMU test print which dumps filament on the right end of the bed as part of an alternate purge system I'm working on.

There are no M0 or M1 codes present, and the fact it's at different places suggests it's not something in the gcode itself causing the pause.

3cube_bucket.gcode.zip

michalxfanta commented 5 years ago

Thank you for your reported issue and we will test the attached g-code.

leptun commented 5 years ago

@DRracer I’ve also experienced this issue while I was developing the lcd buffer. I was printing from USB and I think it was processing (M107, etc) as M1. I’ve only experienced this once though and I think it was while I had some other issues with the buffer.

@vintagepc Did you print from USB when you encountered the issue? Also can you repeat the issue reliably?

vintagepc commented 5 years ago

It's a weird one. I was printing from SD but had pronterface connected because I forgot to disconnect it after I needed to do some manual positioning. It happened probably four of the five times I tried to print variants of this file. (Note the variants did not alter the startup gcode or anything before the purge line was complete)

In previous experiences it's happened on straight-from PrusaSlicer files as well. I've been thinking about it there may indeed be a deeper issue at startup and the symptoms vary depending on what order you have your startup gcode.

I'd noticed previously on my Mk3 that sometimes the initial purge line will get printed at the full width of the bed and at a single rate rather than the stages it normally does. This would also be arbitrary, sometimes happening and sometimes not for the same gcode file. Other folks have observed the purge line behaviour as well, with forum discussion:

https://forum.prusaprinters.org/forum/original-prusa-i3-mk3s-mk3-hardware-firmware-and-software-help/purge-line-variation/

https://forum.prusaprinters.org/forum/original-prusa-i3-mk3s-mk3-hardware-firmware-and-software-help/purge-line-different-sometimes-too-high-flow-for-small-nozzles/

It's also been reported before: in #1164

(My purge line issues went away when I switched to a startup gcode that places the nozzle at a lower temp to prevent oozing on the bed during level, and then warms it up to final temp after MBL is done - but that's when I started seeing the pauses)

leptun commented 5 years ago

Did you try printing the same gcode without being connected over usb? When I had the problem it would show paused for user while sitting idle with octoprint connected. It would be caused by M105 commands to report temperatures being interpreted as M1. But as I said, this isn’t happening anymore for me.

leptun commented 5 years ago

As for purge line variations, maybe it’s because of a confusion between relative mode and absolute mode. You mentioned you have MMU2(S). The thing is the start gcode purge line is different when you have the mmu. It is much longer and on two lines with different heights.

vintagepc commented 5 years ago

Did you try printing the same gcode without being connected over usb? When I had the problem it would show paused for user while sitting idle with octoprint connected. It would be caused by M105 commands to report temperatures being interpreted as M1. But as I said, this isn’t happening anymore for me.

Not this g-code but other cases where it happened have not been connected over USB to anything.

As for purge line variations, maybe it’s because of a confusion between relative mode and absolute mode. You mentioned you have MMU2(S). The thing is the start gcode purge line is different when you have the mmu. It is much longer and on two lines with different heights.

Yes, I'm aware of that but it was well before I had the MMU2S that I had the purge line issues. The MMU's only a relatively recent addition and both of the issues predate it.

BouriCZ commented 5 years ago

I'm having same issues on MK3 (without MMU2) with 3.7.2 firmware with printing from SD card and connected OctoPi only for reading (printer states and camera). No USB connected. Doing constantly with same gcode file, not randomly. Some gcode doing it, some not.

leptun commented 5 years ago

Can you post a gcode where it happens reliably? Also another one where it doesn’t.

BouriCZ commented 5 years ago

I tested today and it does randomly with connected OctoPi. But afrer disabling "RPi Port" in menu (yes, OctoPi timed out), it works fine without pause after after print start line. After enabling "RPi Port" and reconnecting OctoPi it does randomly in all tested files. My mistake, I didn't tested files more times, before i post that comment. I attaching terminal log (304 lines) and two tested gcode files.

gcode.txt Displej-100,5p-Q0.2mm-52m-ABS-MK3.gcode.txt Drzak_LAN_Adapteru-100,5p-Q0.2mm-21m-ABS-MK3.gcode.txt

leptun commented 5 years ago

Looking at the log it appears I was right about M105 being interpreted as M1 or being ignored. More testing is needed though...

There doesn’t appear to be any RX buffer full error, but who knows. Maybe on the second serial port it isn’t reported.

leptun commented 5 years ago

Also was the print started from the lcd or rPi?

BouriCZ commented 5 years ago

I print only from SD card (Toshiba FlashAir) ( = via LCD). My OctoPrint shows only "8.3" file names (like in MS DOS) and with long file names isn't good. And printing from SD card has advantage in case of power failure. I using OctoPrint only for camera and reading printer states and in case of ruining print (when I'm in work and etc.) I can stop the printer remotely via Octoprint.

vintagepc commented 5 years ago

Interesting. From a technical perspective, how does that misinterpretation happen? It makes sense based on the symptoms and gcode reading or parsing being starved for cycles but it's interesting that for me the pause happens well after those commands ought to have been completed. For example in my MMU setup everything is up to temp, and it's done the initial toolchange that comes after that, but it will pause just before it starts the purge line, or even in the middle of it.

Unfortunately M111 is not supported or I'd offer to try running with that to have an echo of what commands were actually received. Are there other such debug options I could turn on that would be helpful?

leptun commented 5 years ago

@vintagepc The command queue is to blame for delayed effect on commands. When commands are sent to the printer they are added to the command queue. The problem is that the sd card also pushes to the command queue. But the worst offender of them all is actually the lcd. Some time ago @wavexx mentioned a false assumption in the code where it basically said that there is always space in the queue when a command is added in the queue from the lcd. It is probable that the queue got filled by the “Print from SD” commands and it stopped buffering, leaving us with M1 instead of M105. As a test, please try reproducing the issue while starting the sd print from octiprint without interacting with the lcd at all.

For debugging to serial, uncomment “#define CMDBUFFER_DEBUG” in cmdqueue.h.

One more thing. Do you happen to have another raspberry pi? One that connects over usb to the printer. I’m interested in whether the issue persists over usb.

vintagepc commented 5 years ago

Thanks for the explanation. It makes more sense now. If it's relevant, I did see an "RX buffer full" at print start this morning via pronterface when printing from SD. (but it did not pause)

I'll recompile with CMDBUFFER_DEBUG on and try to keep an eye on things during prints. I do SW dev by day, so feel free to classify me as an "advanced" user and let me know if there are specific internal things I can do to help chase it down.

leptun commented 5 years ago

Please note that the perfotmance of the printer is diminished significantly when debug is enabled. So expect some stuttering.

vintagepc commented 5 years ago

Understood. I have a longer regular print to do today first so I'll wait until after that when I will be doing a number of shorter prints for testing my post-processing scripts.

leptun commented 5 years ago

Have you tried reproducing the issue on 3.7.1 or 3.7.2 fw? I’ve also had some problems with Full RX buffer in the past while printing. That problem has been fixed though. I wonder if removing those gcodes that aren’t implemented (eg: G21) help. Also please post the log with debugging enabled. Can you maybe post a full log? Enable serial.log in octoprint in settings.

vintagepc commented 5 years ago

I've certainly experienced it prior to 3.8,0, can't recall if it was .1 or .2 I'll prune G21. I'm not running octoprint at all (that's the other guy), just a desktop PC with USB connection (not driving the printer, just watching status). But I'll set it up to capture the log and post it if anything interesting happens.

vintagepc commented 5 years ago

... I think we have a winner. Last few lines:

New indices: buflen 26, bufindr 119, bufindw 402, serial_count 0 new command on the top: M105 Processing a GCODE command: M105 In cmdqueue: 26 Dequeing M105 Old indices: buflen 26, bufindr 119, bufindw 402, serial_count 0, bufsize 493 New indices: buflen 25, bufindr 127, bufindw 402, serial_count 0 new command on the top: M105 Processing a GCODE command: M105 In cmdqueue: 25 Dequeing M105 Old indices: buflen 25, bufindr 127, bufindw 402, serial_count 0, bufsize 493 New indices: buflen 24, bufindr 135, bufindw 402, serial_count 0 new command on the top: M105 Processing a GCODE command: M105 In cmdqueue: 24 Dequeing M105 Old indices: buflen 24, bufindr 135, bufindw 402, serial_count 0, bufsize 493 New indices: buflen 23, bufindr 143, bufindw 402, serial_count 0 new command on the top: M105 Processing a GCODE command: M105 In cmdqueue: 23 Dequeing M105 Old indices: buflen 23, bufindr 143, bufindw 402, serial_count 0, bufsize 493 New indices: buflen 22, bufindr 151, bufindw 402, serial_count 0 new command on the top: M1 Processing a GCODE command: M1 In cmdqueue: 22 echo:busy: paused for user

buffer.log

The guilty party is (likely) pronterface or octoprint querying temperatures with M105, and they get backlogged when the printer is busy doing something with a nearly full buffer, e.g. mesh levelling.

Alas pronterface's scrollback buffer is too short to capture the entirety of the logspam to see where it's getting inserted. I'm going to see if I can split the serial comms to get a full log and also have pronterface querying temperatures.

vintagepc commented 5 years ago

I'm not sure if this is helpful or not, but here's the logspam from a typical run (it did not pause). I'm going to keep trying to capture one where it does pause. in.txt

leptun commented 5 years ago

Good job. If it’s possible to get a bigger log with the issue, that would be great.

So, something surprising I saw in the log is that there is no Enqueueing M1, only M105, but later it gets processed as M1. There are two possibilities I can think of:

1: The buffer keeps filling with commands and it fills and it stops adding new characters to the last command as to not overwrite the queue. 2: Same as above, but it overwrites another command. This one is probably less likely, but I’m not sure right now.

It would be best if we found out when the bug appeared so we could come up with a solution much easier.

Tomorrow I’ll get home and test this myself.

vintagepc commented 5 years ago

Yes - I'm guessing that for the cases this happened without something over USB were also cases where it paused before purging anything, e.g. it failed at M109 and paused instead of waiting for the nozzle. The mid-purge-line pauses can only explicitly happen if something is connected and polling temperatures, otherwise there is no M1## to get misinterpreted.

I'm going to see if I can setup something that spams M105 far more often to try to cause the issue reliably; that should make it both easier to work on and maybe shed some light on where it's sneaking in. If this happens for M1###, I also have to wonder if it's been happening for other commands and just not noticed. Fortunately in case 1 it's relatively benign in that (I think) most truncated commands are safe. for example, "M104 S230" could only be interpreted as S23 or S2. Less so if case 2 is happening and chunks of the previous command remain and add digits etc.

As for when... I'm guessing a relatively recent firmware, 3.7.x or 3.6.x, I cannot recall seeing it until a while after I'd gotten my printer. Unfortunately testing the origin delves into the range of firmwares that had issues with cooking the Einsy from high PWM frequencies. (I wonder if that has something to do with it - the bug having always been there but just exposed by the extra cycles consumed by the heatbed PWM changes or other new features)

vintagepc commented 5 years ago

Well, that was... interesting... I set up a simple terminal script to spam M105 once per second and then tried to print a file. Printer would not even print; it just processed a bunch of gcode commands and then went back to the main menu. Even after I stopped spamming, I had to reset (X) the printer to get it to print the file; it'd just reset when I hit 'print'. Serial log attached. (For reference, pronterface queries temperatures about once every 5 seconds)

in.txt

leptun commented 5 years ago

Can you please post the script? I don’t know python or bash. :)

I’ll try to reproduce the same issue and also see if other lcd interactions break it.

vintagepc commented 5 years ago

Can you please post the script? I don’t know python or bash. :)

I’ll try to reproduce the same issue and also see if other lcd interactions break it.

Sure.

To "tap" the serial IO stream and log it, run this command. It creates a fake serial port (/tmp/ttyV0) for which it logs input and output, but is really attached to /dev/ttyACM0. you can use ttyV0 with pronterface (and probably octoprint) too.

socat /dev/ttyACM0,raw,echo=0 SYSTEM:'tee in.txt |socat "PTY,link=/tmp/ttyV0,raw,echo=0,waitslave" |tee out.txt' Use this script with minicom to spam M105 once a second: script.txt

by running: minicom -D /tmp/ttyV0 -o -b 115200 -S script.txt

if you want to watch the printer's output live as well, you can tail -f in.txt

leptun commented 5 years ago

Ok sooo... I was unable to use the scripts above so can't reproduce it.

But, I think I found something very interesting. When starting a print from lcd, the file is checked for missing M84 command at the end. The thing is that it sets the sd location to 4096 (probably) bytes before the end of the file and starts pushing data from the SD to the queue. Serial communication is NOT stopped while the file is being checked and I suspect that if you send serial data while you are checking the sd file that it might corrupt something this way.

vintagepc commented 5 years ago

What happened - did you get any error messages?

leptun commented 5 years ago

socat:e exactly 2 addresses required

vintagepc commented 5 years ago

Sounds like a formatting error... probably the copy-pasting of the command added some hidden characters or the github formatting doesn't like it.

It's basically the second answer here, maybe you'll have more luck pasting from there. https://unix.stackexchange.com/questions/12359/how-can-i-monitor-serial-port-traffic

If that also fails, you might have to type it out by hand rather than pasting it into a terminal... sorry :-/

Alternately, maybe using minicom's capture argument instead of the socat intermediate, e.g. minicom -D /dev/ttyACM0 -o -b 115200 -S script.txt -C capture.txt

Hope that helps.

leptun commented 5 years ago

@vintagepc Yes, it's definitely formatting error. I am still unable to reproduce it on my "workbench". Even if I am spamming M105 while starting a print, it isn't affected in any way. The closest I've been to breaking something was by filling the RX buffer with too many commands while purging. Have you tried anything else? Please try removing the following;

  if (!check_file(filename)) {
      result = lcd_show_fullscreen_message_yes_no_and_wait_P(_i("File incomplete. Continue anyway?"), false, false);////MSG_FILE_INCOMPLETE c=20 r=2
      lcd_update_enable(true);
  }

It should disable file checking. Please see if this is to blame for the buffer corruption.

Something interesting I found while playing with sending data over serial while printing from SD: If you send any character over serial while SD printing, the queuing will stop untill the command over serial is complete. This has the side effect of stopping the print while manually typing. Another more serious issue with this is that it crashes the firmware(WDT reset) if you lock up the serial while starting a SD print. Behavior in video: https://photos.app.goo.gl/qhpTUVRiivP8VRQN8

vintagepc commented 5 years ago

Ah, this sounds like what I experienced with my M105 spam script:

This has the side effect of stopping the print while manually typing. Another more serious issue with this is that it crashes the firmware(WDT reset) if you lock up the serial while starting a SD print.

I plan to tinker some more tonight, If I can reproduce the pause reliably I'll also try disabling the file check to see what happens..

vintagepc commented 5 years ago

I just had some success with a 3 second spam interval. Printer did not reset, but hung during the purge line. Log attached; points of note are that it corrupted an M900 command to M9 (was never queued as M9) and an additional "Unknown command " ""

paused M105 spam.txt

Now to see how reproducible it is...

vintagepc commented 5 years ago

It's still arbitrary, it seems. Must be a very small window to make it happen with just the right timing.

I disabled the filecheck for now; I'll continue using that modified build to see if I continue to have any M1 hangs or other corruption.

Even spamming M105 at 10x/sec does not seem to have any ill effects with the filecheck disabled.That said, I have been running a modified short gcode file without temps/extrusion for speed of testing. I'll try an actual brief print later while still polling, could be it's during something I disabled like temp waits.

leptun commented 5 years ago

One more thing to test: Is pronterface sending M105 regardless of “echo:busy”? That might explain why it breaks. I think octoprint waits for the printer to answer before attempting another M105 (unless it timeouts that command).

wavexx commented 5 years ago

Behavior in video: https://photos.app.goo.gl/qhpTUVRiivP8VRQN8

OT, but I really need a spare rambo to do this too

vintagepc commented 5 years ago

You shouldn't complain, the beta-testers list message said you got a new Mk3S to play with :P

(I kid, I kid).

Does the printer always report busy repeatedly when printing? Or does it only send it in response to serial commands while in that state? I've noticed that it'll repeatedly print "busy processing" in the pronterface console at those steps, so that may answer your question. But if not I'll watch the outgoing command log tonight, if I have time.

vintagepc commented 5 years ago

Pronterface seems to certainly choke back on the M105s when the printer says "busy", I saw it only send 2-3 during the bed levelling process. (but it did still send them)

vintagepc commented 5 years ago

ohohohoho! we have a winner!

New indices: buflen 13, bufindr 290, bufindw 73, serial_count 0 new command on the top: M109 S205 echo:Storing a command line to buffer: M1 Number of commands in the buffer: 14

`Old indices: buflen 21, bufindr 65, bufindw 305, serial_count 0, bufsize 493 New indices: buflen 20, bufindr 73, bufindw 305, serial_count 0 new command on the top: M1 Processing a GCODE command: M1 In cmdqueue: 20` So it's not buffer corruption, it's getting inserted that way. Log and gcode attached. (And Check_file was commented out) The outgoing .txt log shows only M105s outgoing, no M1s. [infirst.txt](https://github.com/prusa3d/Prusa-Firmware/files/3532395/infirst.txt) [stackable_cat.gcode.zip](https://github.com/prusa3d/Prusa-Firmware/files/3532398/stackable_cat.gcode.zip) This has interesting implications for the situations where the printer was **not** connected to a PC interface. Looking at the log it gets inserted between M104 and M109. The hot-end heater shares its supply voltage with the einsy logic; I wonder if this turn-on causes a voltage sag that is enough to junk up a data stream at just at the right time. Note that the Einsy isn't reporting busy at this time so another strike on the previous theory.
leptun commented 5 years ago

Busy is only reported when a command takes too long. It resets the hosts timeout timer. The host should not by any means resend a command before Ok is received.

Interesting theory. Do you think that the SD is affected by the brownout? In theory the board shouldn’t be affected since both micro controllers have brownout detection enabled.

vintagepc commented 5 years ago

I'm guessing brownout or maybe current spike... it's the only thing that would explain why it still happens even if something isn't attached and sending M105s. Ribbon cables aren't particularly well shielded and they can run quite close to the main current rails inside the einsy enclosure.... I think it's just much worse when something is attached because there is a greater chance of incoming M1## gcode being processed at just the right time...

leptun commented 5 years ago

Sooo when it queued M1 you were using pronterface to view the log or not? I highly suspect pronterface causing some problems. Also was file checking disabled?

As for sd card data corruption, could the current spike affect the data lines only (eg. tie them to high or low). I think sd cards are not that affected by input voltage variation.

vintagepc commented 5 years ago

Yes, pronterface was active for this case since I believe it is just making a hidden problem more visible. (I was thinking of past experiences where my printer was across the room from a PC, no connection, and still paused in this way). Filecheck was disabled when this happened.

I'd think the data lines are most susceptible to these things. I've seen a number of discussions in the forums regarding the length of the LCD cables and interference/corruption of the display, so it stands to reason it could happen on the SD data line as well.

The best way to test would be a scope with bus capture, if we could reproduce it reliably to determine whether the M1 is coming from corrupted serial or SD data. But I'm not equipped for that, alas.

I'll start running without PF querying temperatures going forward and just logging the printer's serial output. Hopefully I'll be able to show the same issue again and rule out PF as the cause.

leptun commented 5 years ago

Any update on this issue?

vintagepc commented 5 years ago

So far I have not had it happen again when not using pronterface. So it's almost certainly the culprit, or at least misbehaving by ignoring the "busy". I still swear it's happened without a USB connection in the past. shrug Guess it's been fixed.

wavexx commented 4 years ago

I'm using "pronsole" and "printcore" from pronterface AKA printrun repository and I've found multiple issues with the gcode sender error handling :/

One issue I run often into is that printcore can manage to spam the printer with temperature requests just after connecting to the point of being stuck. I didn't spend time to debug that, but I can confirm this can be a cause of issues.

vintagepc commented 4 years ago

My experience:

I received the Prusa i3 MK3S a few days ago with firmware version 3.9.0. On my computer I installed Prusa Slic3r with some additional components (prontinterface, ...). I was printing from SD card while USB cable was attached to my computer. Unfortunately, Windows 10 made an automatic update that caused my PC to restart. After the restart Prusa i3 has been reset automatically (as I would have pressed "x" on the printer). The print has been stopped at 98 %.

I hope that my experience can help to improve the firmware.

This is an entirely separate issue and not something that can really be solved; if you want the printer to be independent of the host machine restarting or going to sleep you'll need to print from SD card instead. There's nothing the printer can do if the source of the Gcode commands "goes away" through reset, power failure, etc.

The firmware has to be designed to reset on PC connection or it would not be possible to access the boot loader and perform firmware upgrades.

vintagepc commented 4 years ago

Sorry - I missed that. But the second part still stands - USB connections can be fickle things, so if you're not actively using it then best disconnect it.

I learned the hard way too when I first started. jostled the hub the printer was connected to, the connection dropped and the printer reset mid print. So I feel your pain. :smile:

github-actions[bot] commented 1 year ago

This issue has been flagged as stale because it has been open for 60 days with no activity. The issue will be closed in 7 days unless someone removes the "stale" label or adds a comment.

github-actions[bot] commented 1 year ago

This issue has been closed due to lack of recent activity.