intelligent-agent / redeem

Firmware for Replicape
http://wiki.thing-printer.com/index.php?title=Redeem
GNU General Public License v3.0
36 stars 44 forks source link

Kernel 4.1? What about Ethernet hw bug ? #148

Open PaoloBi opened 5 years ago

PaoloBi commented 5 years ago

I am migrating to RC branch ‘cause of an annoying bug in pathplanner that hangs my printer with 2.0.8. New Ubuntu-based distro uses kernel 4....but BBB has an annoying and well-known hardware problem, when powered-up from expansion connector 30..40% of times Ethernet doesn’t work, you need a hw reset to fix this. I discovered this and thanks following this link https://wp.josh.com/2018/06/04/a-software-only-solution-to-the-vexing-beagle-bone-black-phy-issue/#more-7355 I managed to detect a phy malfunction and to hw reset my BBB with my (modified) replicape. So far, so good....but kernel 4.x broke this fix ! Now I can choose between a 2.08 kamikaze/debian that hangs on certain gcodes and a 2.1 ubuntu version where Ethernet mostly doesn’t work....gosh...

ThatWileyGuy commented 5 years ago

Would you like a development image to try? If so, do you use Toggle on a touchscreen?

PaoloBi commented 5 years ago

I tried yesterday with 2.1.0, the phy problem is there and my workaround doesn’t work any more. My display is a custom project, 1024x600, based on Qt, I talk mostly with Octoprint using its APIs

ThatWileyGuy commented 5 years ago

If you're feeling adventurous, give https://wiley.pub/umikaze/Umikaze-2.2.1-1804test4.img.xz a try.

I run three BBBs on ethernet and I haven't experienced the bug you seem to be hitting, but I've also been running much more recent kernels. That image has a 4.14 kernel.

PaoloBi commented 5 years ago

Thank you, I will check it during the holidays. In the meanwhile, I see that you made a "New path planner" pull request some time ago. As my current version is older (2017-03-24), do you think that your commit should fix my path planner hangup problems ?

ThatWileyGuy commented 5 years ago

Yes. I believe you're hitting a deadlock that could occur when a sync event was queued to wait for the currently queued paths to complete. If the queue only had a single path in it, the sync event would sometimes never fire. I fixed it as part of rewriting the path queue.

PaoloBi commented 5 years ago

Hi Andrew .... Yes ! now the deadlock has gone away (at least with the two "killer" files I found). Thank you very much. I think this pull should be adopted as soon also in "master" branch, because elsewhere this (major in my opinion, as it freezes your printer forever) bug will surely chime in sooner or later. I will try the test4 img as soon, thanks again

PaoloBi commented 5 years ago

Anyway, according to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/devicetree/bindings/net/mdio.txt?id=69226896ad636b94f6d2e55d75ff21a29c4de83b

Some boards [1] leave the PHYs at an invalid state during system power-up or reset thus causing unreliability issues with the PHY which manifests as PHY not being detected or link not functional. To fix this, these PHYs need to be RESET via a GPIO connected to the PHY's RESET pin.

The proposed fix is broken with kernel 4.x, so there are chances that a printer will accidentally have ethernet not working at startup

ThatWileyGuy commented 5 years ago

The blog post you linked earlier is a usermode fix that's broken in 4.x. The kernel patch you linked shows that the kernel is now automatically resetting the entire MDIO bus to get the ethernet PHY to reset properly.

Are you sure this is still an issue on any kernel made in the last year?

goeland86 commented 5 years ago

@PaoloBi I've never encountered this issue since 2.1.0 was released. I may have accidentally hit on it during the release cycles for 2.1.0, but we were already releasing with a 4.4.x series in 2.1.1. The latest dev images have kernels that have fixed this problem.

goeland86 commented 5 years ago

@PaoloBi can you confirm you have managed to work past this issue yet?

PaoloBi commented 5 years ago

Yes, the deadlock has gone away, my old kernel still suffers the "dead on start" Ethernet problem, jus yesterday my printer self-restarted once due to this problem but the fix works always. The latest dev images have kernels that have fixed this problem ?...If I were absolutely sure I'd start to migrate to new kernel (but not for now). Thanks to all for helping !

goeland86 commented 5 years ago

@PaoloBi you're more than welcome to try and flash an SD card with the newer image and edit the uEnv to boot from the SD instead of flashing it to the eMMC.

PaoloBi commented 5 years ago

Thanks...problem is, I modified quite a lot my redeem sources to add new sensors/probes/actuators (I talk with a custom made replicape board with an STM32F1xx) and to talk with my GUI program, so I will have to reply all these changes on new image. Hope to find the time and inclination to.

goeland86 commented 5 years ago

You can use the RCN provided scripts to update the kernel on your running image - they're in /opt/scripts/tools/update-kernel.sh