grblHAL / core

grblHAL core code and master Wiki
Other
326 stars 85 forks source link

Cannot Restore After Feedhold When Issued During Change of Spindle State #491

Closed engigeer closed 5 months ago

engigeer commented 5 months ago

Running grblhal in laser mode, I am experiencing issues with the feedhold functionality where it is unable to successfully resume if the feedhold is triggered right before a line which contains only an M03 or M05 laser on/off command.

Minimum reproducible example code (issues occur if feedhold triggered between execution of line 8 and line 9):

N1 G54
N2 G90
N3 G0 X0 Y0
N4 G91
N5 M03 S100
N6 G01 X100 F500
N7 M05
N8 G01 Y-2.5
N9 M03
N10 G01 X-100
N11 M05
N12 G01 Y2.5
N13 M30

I initially noted this issue on a FlexiHal but was able to replicate it also on a RP2040 Pico which leads me to think it is something in the core. In practice it seems to be more common with higher acceleration values but I think this is due to the increased chances of timing the feedhold between lines if issued during a short / fast travel move.

Happy to provide additional debugging info for troubleshooting if you have any suggestions.

terjeio commented 5 months ago

I am not able to reproduce with a STM32F446 controller. Please post $I and $$ output.

engigeer commented 5 months ago

StatusSettings_Issue491.txt https://github.com/grblHAL/core/assets/21988066/9999b2a7-c931-494a-9acb-ebaed034851b

No problem, here is the requested output as well as a quick screen recording to illustrate the issue.

terjeio commented 5 months ago

I am still not able to replicate with the F466 so which ioSender version are you using? And can you replicate with the latest edge version?

engigeer commented 5 months ago

I have just tested with ioSender Edge 2.0.45p5 and was still able to replicate the issue with my RP2040 Pico (all settings at default values as shared above with the exception of $32=1 - i.e., laser mode). I have reduced the MWE test code to:

N1 G54
N2 G90 X0 Y0
N3 G91
N4 M03 S100
N5 G01 X10 F500
N6 M05 (debug, Spindle Off)
N7 G01 X-10
N8 M30

The issue occurs at Line 6 and I have been able to consistently reproduce by triggering a feedhold in ioSender such that it takes effect exactly as the machine reaches the end of the previous travel move (DRO reads exactly X = 10). After feedhold is triggered, the control board becomes unresponsive, and fails to respond to all commands until it is power-cycled / hard reset).

Now things get a bit strange . . . I tried uploading a fresh copy of firmware from the web-builder with RS274 NGC expression support so I could add some debug messages and determine if the freeze was occurring before or after Line 6 was executed.

On the first test with the new firmware, the issue persisted, but on subsequent tests I could not reproduce it. I then tried uploading a fresh copy of the firmware from the web-builder without RS274 NGC expression support. This time, I could not reproduce the error . . . until on a whim, I tried another hard reset of the board (unplug and replug from my USB and relaunch iosender) and after this the error was reproducible. Note that the board was already physically unplugged and replugged between the .uf2 flash and the first run so I don't understand why this second hard reset would cause anything to change. Maybe there is something special about the first run after new firmware is installed that I am unaware of? I think this is unrelated to the primary issue with the feedhold, but figured it was worth mentioning in case not.

engigeer commented 5 months ago

Also, I believe this is likely unrelated, but I noted some additional strange feedhold resume behaviour in the course of this testing (see screencap attached). If the feedhold was triggered during the initial acceleration on line 7, I sometimes had an issue where it would require two cycle start commands in order to resume, and there was a noticeable delay / dwell before it eventually accelerated and completed the movement. This did not cause the controller to freeze or crash, and I observed it when using both the firmware with and without NGC support (unlike the other issue detailed above).

https://github.com/grblHAL/core/assets/21988066/bfc94acd-03e9-4240-894b-47e2fe34ae82

terjeio commented 5 months ago

I finally managed to replicate this... It seems finite float precision is the cause. One part is that deceleration distance delta may end up as a small negative value leading to calling square root with a negative value - which returns NaN (not a number). And a smallish positive distance delta may cause a step to be missed leading to the position ending up a step short of where it should be. I have added checks for these that sets the distance delta to 0 if it is less than 100nm. And there is an issue in the planner that earlier might have been masked by the planner buffer beeing zeroed (when it was statically allocated). There is also an issue that I believe I have to handle in ioSender - if a feed hold is issued when the last motion is executing and the feed hold ends up in the target position the cycle start button stops working.

If you are compiling yourself then I would like you to test the changes - I'll add the modified files in a comment if you can do that.

engigeer commented 5 months ago

Yes, I am able to compile the firmware and would be more than happy to test the changes. Thanks for helping track down the source of this, much appreciated.

terjeio commented 5 months ago

Ok, here you go:

grbl.zip

engigeer commented 5 months ago

I tested the changes to the firmware on both the RP2040 and the Flexi and it seems that the issue is resolved. Let me know if you would like me to close this issue, or if you want to wait until you commit these changes. Many thanks for the fix!

terjeio commented 5 months ago

Thanks for testing!