prusa3d / Prusa-Firmware-Buddy

Firmware for the Original Prusa MINI, Original Prusa MK4 and the Original Prusa XL 3D printers by Prusa Research.
Other
1.12k stars 219 forks source link

[BUG] Disconnecting and reconnecting to network after a while, resulting in choppy movement #1807

Closed Crawlerin-SA closed 2 years ago

Crawlerin-SA commented 2 years ago

Please, before you create a new bug report, please make sure you searched in open and closed issues and couldn't find anything that matches.

Printer type - [MINI]

Printer firmware version - 4.3.4

Original or Custom firmware - Original

Optional upgrades - Filament Runout Sensor, Bondtech Extruder

USB drive or USB/Octoprint - USB Drive

Describe the bug My MINI is connected to my home network via Ethernet cable. Updated to 4.3.4 around 3 days ago. After few hours, I have noticed occasional lag every few seconds when navigating menu. Icon indicating network connectivity appears and disappears every few seconds. I have disconnected and reconnected cable, reset switch it is connected to, it does not help. Resetting MINI helps. But after some unknown period of time (hours) it starts doing it again. In Network settings, it switches between showing IP address, and not having an IP address.

When this happens mid print, it makes the movement stop for a second, resulting in blobs. Physically disconnecting cable does not help and MINI continues with occasional stop. Sending link to video where it shows what it's doing. The print on video is 10 hours, MINI was rebooted before pre-heating and starting the print and was connected to network. I did not want to reset the printer at the time of filming this video, it has 3 hours to finish and I can post-process those blobs.

I have changed USB drive thinking it cannot read the medium, but that's not the case.

How to reproduce Leave MINI connected on network for few hours with networking enabled. I cannot completely exclude possibility it's issue with my network, but I do not have other to try it.

Expected behavior Do not keep disconnecting and reconnecting to the network, as it causes overall system hang for a second or two. Or if the network stack crashes or printer disconnects, prioritize printing process and leave printer offline to maintain print quality.

G-code Attached, I don't think it makes any difference.

Crash dump file Will attach once print finishes

Video https://youtu.be/XhrRe_WiBws

01 AB drives_0.2mm_ABS_MINI_10h8m.gcode.zip

Crawlerin-SA commented 2 years ago

Crash dump after print and cooldown.

Update: Now it occasionally detects link even without cable plugged in, or has no link when cable is plugged in. I tried different switch and cable. Sometimes it detects link but cannot obtain IP address. It's actually possible that the Ethernet port died.

dump.zip

murk-sy commented 2 years ago

I have had a similar issue on one of my printers (some time ago), ended up having my board replaced, though I can't say for certain it is physical damage. It is possible this is an issue on busy networks. Likely unrelated to firmware upgrade.

Try a factory reset, and if the issue persist, you should probably just contact support for a board replacement.

If you don't have issues when the cable is disconnected, and if you can wait a bit, there has been a lot of networking improvement related commits lately, so the next major firmware might finally introduce Prusalink.

For reference, here is my original message to support:

Hi,

I am having network related issues that are affecting print quality when network is enabled. Since I have 2 printers, I am suspecting a hardware issue.

Video of the issue is here, first showing printer 1 (works fine) and then printer 2 (the problematic one, serial CZPX4820X____): https://youtu.be/5cNWjmZJkmQ

When network is enabled, the printer appears trying to use the given configuration and then failing immediately, then repeatedly turning it "off and on" (at least as displayed in settings). When the cable is plugged in, the activity light on the switch remains off.

Currently running latest 4.3.0 firmware (both printers), but this has been an issue on older firmwares. The networking has worked before but it has been flaky and sometimes stopped being accessible, now having it enabled appears to be stopping the printer and causing oozing (see attached photo). The pictured print took a lot longer on the problematic printer than on the fine one (+30 minutes on 10 hour print).

If networking is disabled, the printer functions perfectly fine.

I have ruled out the following (mostly by switching printer 1 and 2)

  • Network cable
  • Network switch port
  • Network configuration (DHCP or static)
  • Printer configuration (ran factory reset before making video)

Since the farm software still isn't out I use the network access very rarely, but I'd rather make sure it's functional when that is a working feature.

I am willing to do further tests if required or replace the board myself if necessary.

DRracer commented 2 years ago

@Crawlerin-SA @murk-sy thank you for your reports. We have got similar reports in the past but haven't been able to reproduce any of them. So far we only have vague observation what may be happening and what may be the cause.

@Crawlerin-SA May I ask you to check the network traffic (wireshark or a similar tool)? How does it look just before and after the issue occurs? Is it just that the MINI starts disconnecting suddenly withou any relation to network traffic?

I have seen a similar situation in the past (not on a 3d printer though) - bent wires inside the ethernet connector (possibly due to some bad eth cable plugged in). Could you please inspect the state of your cable and the MINI's eth connector? Look for any mechanical defects.

Anyway, I hope the priority issue goes away with the new networking stack (being developed right now).

murk-sy commented 2 years ago

@DRracer If it helps, I've returned the faulty board and it might still be in your warehouse, claim reference is SD-69336, shipping label 1Z8E666Y9197756794 (UPS)

Crawlerin-SA commented 2 years ago

Hello DRacer, thank you for looking into this. The Ethernet port does not connect anymore, and occasionally detects link even if there is no cable physically connected (icon appears for few seconds and has IP address of all zeros, then it disappears). Lights do not light up on switch nor on MINI. I disabled Networking from Settings and it's not causing problems anymore. I think the port is just busted. @murk-sy may be right, firmware update was probably just a coincidence.

I don't feel well today, and MINI is printing right now, but at the next occasion I will do factory reset and pull out MINI to physically inspect port and connect to laptop with Wireshark. Cables I used are OK, tested them with professional tester at work. MINI had been connected on its usual place since ... long time :-) and was on a network previously. I updated electronics cover to more perforated but I don't have cooling fan there, which may had just accelerated failure as I print in an enclosure past few months.

2-year warranty has already expired end of last year, it was MINI from first batch ever :-) As long as it prints from USB without networking and nothing else breaks, I can live with it and wait for Wifi module support which hopefully comes one day. If I need networking badly I can slap Pi Zero 2W there with Octoprint.

My main point was that disruption in network connectivity regardless of its origin should not cause whole system to stall as it causes artifacts, and prioritize printing process over acquiring IP, if that's possible.

zoltan-l commented 2 years ago

Good morning @Crawlerin-SA , did you please try to revert back to previous FW release? If so, did you encounter this issue as well? I would prefer to revert to previous release instead of factory reset. Let me know please, if it works fine with the previous FW: Thanks

Crawlerin-SA commented 2 years ago

Hello, I apologize for delay, covid shot and MINI was mostly printing. I re-flashed firmware to 4.3.3, enabled networking, after reset network had come up and printer grabbed IP address from DHCP. I will now let it do its usual stuff and see if it breaks somehow after few hours. MINI_LAN MINI web

zoltan-l commented 2 years ago

@Crawlerin-SA thanks for resposne and I hope you feel better now. I Look forward any (positive and/or negative) feedback. Can you pls provide your printer purchase order info for further investigation eventually the printer ID?

Crawlerin-SA commented 2 years ago

Hello, Order 192341446 machine ID CZPX5019X017XC00994 motherboard is original since purchase.

Printer had been running without restart since downgrade, printed 3 plates and no issue so far. No stutter and printer is still online.

zoltan-l commented 2 years ago

I am happy to hear that it is working, on the other hand a bit worried what could be the cause of the indicated fault. I know you are rather bussy. May I ask you, when you will have time to flash back to 4.3.4 to see if it will work or fail? In you closed box what temperature is usual? Can you please let me know what budy board version do you have?

Crawlerin-SA commented 2 years ago

Flashed. Network seems to be up as of yet. Do you want me to turn something on or off, re-set, or just observe for now?

Buddy board says 1.0.0, it's the first version still with jumpers for flashing 3rd party FW. I haven't broke the tab or modified it in any way.

Normal temperature in box hovers around 43, 45 degrees max when printing ABS and ASA. I haven't noticed negative effects from overheating like skipped steps or other failures. I use this lid as it has more vents and easier cable management https://www.prusaprinters.org/prints/72989-prusa-mini-z-top-electronics-box-lid I can install small fan or heatsinks if it's recommended.

zoltan-l commented 2 years ago

The observation would be great. If the malfunction will appear again we will do next steps.ok?

Crawlerin-SA commented 2 years ago

Hello, I did few short prints and let it idle overnight, so far there were no symptoms like before, menu movement was snappy and network icon was not appearing and disappearing. Although previously those symptoms did not appear immediately either, maybe after like half a day or a day?

Sadly I had to turn printer off this morning, as I will be away for the weekend. But I will continue on Sunday, I have few more long prints to do.

zoltan-l commented 2 years ago

great, thanks, have a nice wekend

FrankPGH commented 2 years ago

I just experienced this exact same behavior, also on 4.3.4. I noticed the network icon turning on and off (1-2 second intervals?). Rebooting the printer seems to have resolved it for now.

zoltan-l commented 2 years ago

Hi FrankPGH do you know the details? Would be great to see them. Did it happen immediately after you flashed to 3.10.1 ? What buddyboard version do you have please ( menu info-> version info) ? From which FW version were you flashing? How exatly did you proceed with flashing to 3.10.1?

Thanks in advance.

FrankPGH commented 2 years ago

Unfortunately, I won't have a lot of useful info for you. I didn't spend much time on troubleshooting and the problem hasn't reoccurred once I did a reboot. I only stumbled on this thread because I was looking for info on why the time was off by two minutes on the clock. Here's what I have, though:

This is a Christmas printer, so I've only been using it for about a month. I flashed 4.3.4 on it during initial build and have used no other firmware. The printer was printing fine with no errors during that time. At the time I noticed the problem, I had already completed a couple of other prints, and this latest print laid down the first few layers successfully so I left the room. I came back to check on it and noticed the stuttering every couple of seconds and the nasty blobbing. I also noted that the network icon was blinking as though it were connecting/disconnecting.

In the second picture, you can see the spacers I was printing. There are blobs on the inner and outer edges, and you can sort of make them out on the top as well. It looked like it was still extruding during the pauses.

20220125_075931 20220125_082818

zoltan-l commented 2 years ago

Frank3.PGH thanks a lot, just let me ask a few more questions. After you loaded the 4.3.4 , did you pass through the Wizard? Were all tests ok?

FrankPGH commented 2 years ago

I ran the Setup Wizard after the firmware update as part of the initial setup process in December. After the error, I rebooted and then ran the Selftest in the Calibration menu just to make sure I hadn't jacked up the hardware somehow and everything came back green.

Crawlerin-SA commented 2 years ago

I have it running since Sunday evening CET, and so far the error hadn't demonstrated. I did 2 ~4 hour prints, no issues. Printer still has IP address and icon is not blinking, menu is not laggy and prints are fine. Though, some time during the day I lost access to printer's web page, it does not load - shows only black background and nothing else. Printer responds to pings.

When I upgraded, downgraded and then upgraded again, I hadn't ran any wizard or setup.

zoltan-l commented 2 years ago

@Crawlerin-SA thanks for info, do you remember please, from which FW version did you originally upgrade to 4. 3.4?

Crawlerin-SA commented 2 years ago

4.3.3 full release. And before that 4.3.1.

zoltan-l commented 2 years ago

Hi guys, any news?

murk-sy commented 2 years ago

I wouldn't expect any comprehensive fix until networking/http server rewrite is completed. An optimistic guess would be "probably within a month or so". You can keep an eye on #1790 but it merging won't mean that the update will be released quickly after.

There is a possiblity that the issue will persist, but only testing will tell.

zoltan-l commented 2 years ago

strange thing is, that the issue after you reset the printer disappeared and could not be replicated. it could be, something went wrong during upgrade, wchich was "corrected" by printer reset.

zoltan-l commented 2 years ago

I am trying to replicate without success. I anyone of you will encounter this failing again, let me know please.

murk-sy commented 2 years ago

@zoltan-l When I originally had the issue it didn't happen immediately, but over time it slowly became less responsive until it was not reachable via http anymore. Resetting the printer fixed it (temporarily). The board I had really big problems with ( https://github.com/prusa3d/Prusa-Firmware-Buddy/issues/1807#issuecomment-1008667212 , may still be on a shelf somewhere?) was not fixed with a factory reset, though that could've been a bug in the version I was using. Another possibility I couldn't rule out was ESD damage, because sometimes I got shocked a lot touching the printer and inserting the usb drive.

Along with a somewhat odd setup I was using (printer -- wireless bridge -- rest of network, since I don't have a cable laid) we have a few very outdated devices on the network, including a Windows 2000, XP and 7 machine each. However, beyond SMB1 and probably NetBIOS traffic, there probably isn't anything too bizzare going on.

One thing I thought about was that since i think it stopped working after connecting to it with a browser at least once, it is possible that Firefox did not properly close its connection, and the printer kept sending data (or perhaps even multiple copies of it) ad infinitum.

I can set up wireshark on a virtual machine, open the printer wui and watch for all traffic involving printer's IP - though I can't really monitor all broadcasts going in. Alternatively, if there's some sort of debug firmware which I could connect to a RPI and get data from directly, I can look into that as well.

zoltan-l commented 2 years ago

@murk-sy , thhanks for response (btw.your board is laying on my desk I need to install it and check) as the current issue disappeared after the reset (just I have in mind if it was simple reset by the reset button or factory reset) which let the failure disappear.

murk-sy commented 2 years ago

After a printer reset (by button, not factory reset) the wui was once again available, but when i reached the point of IP address flashing and whatnot I sort of gave up on resolving it since the feature wasn't that useful to me anyway.

Great to hear you've got the board in hand, hopefully it helps out. See also video to verify if initial unplugged behaviour is the same as I had.

zoltan-l commented 2 years ago

HI @murk-sy I am elaborating with your board. Do you remember what FW version did you use. Did you report the issue here on GitHub? I f so, can you pls provide the link?

zoltan-l commented 2 years ago

Hi @Crawlerin-SA no more appearance pls?

murk-sy commented 2 years ago

@zoltan-l I didn't make an issue at the time, but the firmware was 4.3.0 and the issue appeared on older firmwares as well.

See https://github.com/prusa3d/Prusa-Firmware-Buddy/issues/1807#issuecomment-1008338489 for my original claim submission with other details.

I checked through my emails when I was arranging a return with Tommy Muszynski (first reply was 9th February 2021) and I found some more details you may find useful (at the time I had 2 printers, claim was made on the second one):

Email 1:

I've received the board with the rest of the order, replaced it on the second printer and everything works like a charm now! ...Is what I would like to say, but in the last few days, the first printer started having the same issue as well.

Since this is a bit too well timed to be a coincidence, I started to assume there has to be something specifically local to my usage. Could causing an ESD on the build plate kill off the ethernet chip perhaps? I've had a few shocks when touching it on either printer, and to be honest I'm fairly sure I've had it happen on the USB port too - since the included USB stick is metal. I took extra precautions when handling the replacement but I generally don't have a ground wire on me. The issue might actually be more widespread since as the current ethernet connection doesn't really provide remote management, most people probably don't use it.

Email 2:

I agree, no point arranging replacement until we find the cause. Like I said the printers were and are both functional, it's just having ethernet on that's causing issues. Well, having ethernet enabled caused printing issues last time I had it on on the board I'm returning - I assume due to processor getting bottlenecked, it was blobbing quite often due to I presume stopping.

The issue is basically the same - the network switch doesn't detect the connection and it's doing the same weird showing/hiding settings I've shown in video, in addition to access via IP of course not working.

Email 3:

Some other info I remembered might be relevant:

  • Before it failed, the web interface was sometimes slow to load (similar to https://github.com/prusa3d/Prusa-Firmware-Buddy/issues/1285 I guess), but I think printer restart fixed it before it started going completely bananas
  • Time and estimated finish time have been incorrect last few times I looked at them [Note: likely different bug]
  • After auto bed leveling completes when a print is started, the LAN icon flashes (as if attempting to contact NTP server) despite LAN being disabled

It would be hilarious if the issue was somehow caused by NTP not being able to reach the servers, because I've once had a high refresh rate LED panel flickering issue that only appeared when the device was connected, but not on the internet.

Either way, hope this helps you a bit. If the board is not showing any issues now, that at least means it's not physically damaged.

zoltan-l commented 2 years ago

@murk-sy Recently I am torturing "your" board and during 2,5 hours torture I did not discover any package loss nor nonanswered response. I am using the 4.3.4 FW release + MS Edge diagnostic tool

image

murk-sy commented 2 years ago

@zoltan-l Before IP flashing happened, it took some time (for example a day or two being idle or printing) before Connect Local became unreachable (i did not have it open all the time). I am also not sure if it always worked first try and then not after (which would mean zombie sessions or something along those lines).

Freezing and print blobbing started happening much later after the general network problems began, and at that time the switch did not even recognize the printer at all (Email 2)

If the board works perfectly fine now, the issue must be something that survives printer resets or even factory resets, but not being unpowered for a while. I am pretty sure I tried turning off and unplugging the printer for several minutes and retrying, only for the same issue to appear, but I can't say for sure.

If a memory dump also includes all network related data and cache, I could "sacrifice" one of the printers and connect it to the network again, see what happens. If an early indicator the overarching issue is just Connect Local being unreachable, it probably shouldn't take too long. Please let me know if that would work.

Crawlerin-SA commented 2 years ago

Hi Zol;tan, no, I haven't experienced it since. I ran the printer for 2 weeks without any restart with occasional print (not much since I was mostly done printing parts for other printer, but occasional cosmetics or holder etc.). I haven't experienced this issue anymore. Web interface is also solid. I powered off and on printer few times too, all seems to be working fine now.

I am really not sure what happened, I cannot seem to replicate the issue now. All I did was downgrade and upgrade as instructed.

Edit: I forgot to say that the web interface which was not working and showed black background as described in https://github.com/prusa3d/Prusa-Firmware-Buddy/issues/1807#issuecomment-1021806191 fixed itself, next day it was working normally.

Crawlerin-SA commented 2 years ago

I have noticed this too, that network icon occasionallly appeared for a second or two even when I disabled networking in Settings. But it did not seem to have any effect on print (fortunately).

zoltan-l commented 2 years ago

Hi both, thanks for info, I will keep a while still torturing.

zoltan-l commented 2 years ago

Cunclusion: the issue did not appear since two weeks, the returned board after 1é hours of testing did not show any failure, I am closing the issue with reason : replication did not succeed. Should you guys find the issue again, feel free to reopen the issu.

Agreed?

murk-sy commented 2 years ago

@zoltan-l I am planning to hook up one of the printers that had issues before. Right now none are connected to avoid printing issues. Will providing a memory dump (from printer menu) once the issue starts happening be sufficient, or is there something else I should do?

I have almost no doubt it will appear again, but I'd like to know what information you would actually need.

Please for now set status as "awaiting response" and I will tag you once I have some concrete info.

Crawlerin-SA commented 2 years ago

Cunclusion: the issue did not appear since two weeks, the returned board after 1é hours of testing did not show any failure, I am closing the issue with reason : replication did not succeed. Should you guys find the issue again, feel free to reopen the issu.

Agreed?

That's fine with me. Something may had broke during flashing process, though I didn't do anything else outside of normal flashing process, and there were no errors during or after flashing. You see on video that it happened and that network gradually broke, but at this point we are chasing ghosts.

I can leave this opened for @murk-sy to provide additional dumps from their printers, should the behaviour appear again.

murk-sy commented 2 years ago

I've connected all 4 printers to the network now, and I've run into the issue I believe I've had before the bigger address flashing issue. Printer 3 has had this issue twice now in 2 days (it was the first connected while others were printing) and printer 1 has had it once.

Despite all printers having STATIC set in the config file and in the settings manually (I believe it didn't automatically change to static when settings were loaded, so I had to do it manually), it suddenly starts using a DHCP ip address while still being set to static. Changing the setting to DHCP and then back to static fixes it (temporarily). I was never able to get hostname to work, but that's beyond the scope of this.

Video: https://youtu.be/S0TesZMnQXU 60 is the static IP set, 191 is set by DHCP.

This may be a completely separate issue, but it might be relevant.

Edit: Issue repeated within 12 or so hours later on printers 1 and 3. Maybe even numbered ips are just cursed on my network Edit2: Printer 3 has had the same issue again within 5 hours while being idle. I will not change it back to static it to see if that affects anything later. Edit3: Tested if an IP conflict could cause the issue by setting printers 3 and 4 to the same IP- thinking it could be caused by switching from static to an already occupied DHCP address. There was no immediate effect so I reverted settings to static as it should be. Edit4: 13 hours later, issue appears on printers 1, 3 and 4. I will try disabling network (which is something I think I did before flashing appeared and I gave up on it working consistently) on printer 3 and see what happens. 1 and 4 will be set to static again.

murk-sy commented 2 years ago

I haven't been able to replicate the IP flashing feature, so it is possible it has been resolved somehow. I will keep everything as is and report any changes. If the flashing starts happening once again, I'll be making memory dumps and whatever else I can think of. I think for now, the issue can be closed.

@zoltan-l The IP still eventually switches from static to a DHCP IP, despite it not being enabled (see previous comment). Should I open a seperate issue for that or is that already a known bug?

zoltan-l commented 2 years ago

@murk-sy Definitely, mixing two different things in one issue coul be misleading.

murk-sy commented 2 years ago

Some new behaviour, happened on 3 of 4 printers (probably not at the same time, I haven't checked in a while).

Printers all had data set to 0.0.0.0 and they were unreachable (obviously), loading data from file did nothing, switching to dhcp worked and switching back to static worked.

https://www.youtube.com/watch?v=Yr9VC_Mx-8g

Eagerly awaiting the networking alpha/beta - if it comes to that

murk-sy commented 2 years ago

@zoltan-l @DRracer I have reproduced the flashing issue and now have it active on printer 1, firmware 4.3.4. Flashing happens regardless if the printer is physically plugged into the network, and despite the flashed settings being static, the printer is unreachable. All other printers currently function fine and are reachable by network.

What steps should I take to gather data? Any crash dumps I create from the menu are identical to ones I made after printer restart a while back, so I don't think that would be particularly useful. Is there a way to do a proper RAM dump via USB without restarting the printer?

Feel free to also send me instant messaging contact info (preferably Discord) to github at workrum dot net and I can work step by step to get and test whatever you need.

murk-sy commented 2 years ago

@zoltan-l @DRracer I'll need that printer back in production very soon, so if you need any data pulled let me know - otherwise I'll reset and just keep an eye out for the issue for next firmware update.

To be clear, the printer is now experiencing blobbing as shown in pictures before, not just flashing icon.

Crawlerin-SA commented 2 years ago

I believe we provided all the information needed in this case. Problem failed to be replicated on Prusa side. As a work-around it is possible to disable networking in Settings. In my case problem cleared by re-flashing to old firmware and back to new one. I am not sure whether @murk-sy had their printer work, but I believe they have their printer back in production now and we won't get more information. I am closing this ticket now, if that's OK with you. Should I bump into similar issue or other people experience the same, issue can be opened.

murk-sy commented 2 years ago

@Crawlerin-SA The problem still persists on 4.3.4 and I've had it happen on printer 4 just today, with stoppages during bed levelling after print start. However, since I can't pull any useful diagnostic data from the printer, I decided to just wait for 4.4.0.

From what I can tell 4.4.0 will have a fairly major networking update (along with Prusalink stuff) so it will probably fix the issue. If not, I'll reopen the issue and do my best to help.