terjeio / grblHAL

This repo has moved to a new home https://github.com/grblHAL
231 stars 90 forks source link

M30 & poss other M commands in file causing freeze #293

Closed NeilMarley closed 3 years ago

NeilMarley commented 3 years ago

Early days in debugging/proving my set-up but I can consistently reproduce this error...

Running a GCode file or macro (as generated by the turning or threading macro tabs in IoSender...) if the file ends in an M30 (Pgm End) - execution freezes once that command is sent (but not ok'd) while other blocks are still executing, and console reports Msg;Program End while midway through execution of a preceding block, controller reports Idle status... Can shunt things through with Start but always stalls...

Seems the M command (in this case M30) is bypassing G Code buffer processing (a good idea in general but not for M30 .... and maybe some others?)

; grblHAL ; 1.1f.20210421 ; [OPT:VNMSL2,35,1024,3,0] ; [NEWOPT:ENUMS,RT+,ES,LATHE,TC,SS,PID] ; [FIRMWARE:grblHAL] ; [NVS STORAGE:*FLASH] ; [DRIVER:iMXRT1062] ; [DRIVER VERSION:210423] ; [DRIVER OPTIONS:USB.2] ; [BOARD:T41U5XBB]

terjeio commented 3 years ago

This is odd...

Seems the M command (in this case M30) is bypassing G Code buffer processing (a good idea in general but not for M30 .... and maybe some others?)

It does not bypass anything - on a M30 the controller waits for the previous command to finish before executing it. This wait can be terminated by abort or cancel and I wonder if the board modification for spindle sync is somehow involved. So - does this happen when you disconnect the spindle encoder? Or run a small program without the spindle running, e.g:

G0X100
M30

If it does not then somehow the encoder pulse is triggering the halt input, either due to a short or the map file beeing wrong. Or bad driver code?

NeilMarley commented 3 years ago

Hi, Thanks for the response.

Sorry. It seems I was led astray by the test script I was using - could consistently get it to freeze when the script ended with M30 and not freeze with the same script without M30 at the end. However, with more trials today I can get freezing with other test scripts with and without M30 so it just looks like there is some some sort of lock-up with an indirect dependency on script duration...

I can reproduce this with the controller on the bench only connected to the PC and not the lathe so no Spindle index/pulse, stepper, or other electrical noise so I don't think it's a wiring or noise issue. It's also true that this build can seem sluggish responding to e.g. Start/Hold buttons and outputting status reports - there must be something guzzling processor somewhere. Generally, scripts run through & sometimes complete. But at other times they will freeze motion then after a second or 2 the state reports as IDLE - is there some watchdog or timeout triggering the fall-back to idle?

I'll go back to a build without SS but keeping the pin re-assignments and see if that stabilises things & helps narrow-down the issue.

Neil

terjeio commented 3 years ago

It's also true that this build can seem sluggish responding to e.g. Start/Hold buttons and outputting status reports - there must be something guzzling processor somewhere.

Could be an interrupt firing way too often? Running a test now on a Pro protoptype board with SS enabled - no issues.

But at other times they will freeze motion then after a second or 2 the state reports as IDLE - is there some watchdog or timeout triggering the fall-back to idle?

There isn't any. I had a similar issue earlier when I added settings enumerations, I had force these to stay in flash (constant data) to work around that - why this happened I do not know/understand.

If disabling SS resolves the issue then disabling related interrupts (with SS enabled) would be the first step towards isolating the cause.

NeilMarley commented 3 years ago

Solved...

The uploaded version of T41U5XBB_ss_map.h has Y_AUTO_SQUARE enabled which results in a pin assignment for the Y2 limit switch - which I don't have so running was hindered by constantly servicing the Y2 limit in the absence of an NC connection.

All running smoothly now. :-)

The logic of the pin assignments in this file seems a little too complicated - I'm not sure there's any logic in assigning a pin to Y2 limit when Autosquare is set but Ganged isn't?

NOTE The QEI_SELECT_PIN is still assigned to 35 in this file which will clash with the re-assigned RESET ..... this might account for the issues in #261?

Now to get on and try some PID tuning for SS.....

Neil

phil-barrett commented 3 years ago

Glad you resolved the issue.

Did you make the RESET change to T41U5XBB_ss_map.h? RESET and AUXIN3 (ST3 on the PCB) should be swapped given the modifications you made.

Note that Pro board (V2.51 and later) has different changes (RESET is on 40, STEP EN Y is 35 and AUXIN3 on 14).

Neil, are you interested in beta testing the Pro board? Contact me via Tindie if you are.

terjeio commented 3 years ago

The uploaded version of T41U5XBB_ss_map.h has Y_AUTO_SQUARE enabled which results in a pin assignment for the Y2 limit switch

My bad, sorry for that.

The logic of the pin assignments in this file seems a little too complicated - I'm not sure there's any logic in assigning a pin to Y2 limit when Autosquare is set but Ganged isn't?

Autosquare implies that the axis in question is ganged so IMO it makes sense.

NOTE The QEI_SELECT_PIN is still assigned to 35 in this file which will clash with the re-assigned RESET

I'll fix that - only an issue if reusing the board for non-lathe use since QEI_ENABLE and SPINDLE_SYNC_ENABLE are mutually exclusive.

..... this might account for the issues in #261?

Could be - but the noise must be bad if an open input on an optcoupler is able to pick up enough energy to trigger the output, but since you wrote

... running was hindered by constantly servicing the Y2 limit in the absence of an NC connection.

then I am not so sure any more. However, I have not seen this behaviour when testing with the boards I have. Phil, can you comment on this?

phil-barrett commented 3 years ago

The EL357 optocoupler has a cut off frequency of 80 kHz. If it is generating interrupts based on noise, the EMI has to be very strong and significantly lower frequency than typical. To be honest, that seem fairly unlikely. I run with 5 Axes enabled all the time and no limit switches hooked up to 2 or 3 inputs at minimum. I am not seeing any noise based limit activity. I am not saying it isn't happening in this case but have not seen it while testing here. An oscilloscope would help nail this down , if available.

Is it certain that the Y2 limit input is the cause? I would try shorting across the unused limit inputs and make sure they are set to NC (i.e. not inverted in grbl settings). The amount of energy needed to trigger will be extremely high. There will be 12mA of current flowing that has to be significantly disrupted in order for the opto's LED to go dark. (Which is why NC switches are significantly less noise sensitive.)

Also, worth spending a little time going over the PCB modifications. Check to make sure there isn't any errant trace material left over. Those are fairly thin traces and often quite hard to see. I had one shorting to a resistor once - had to examine with a microscope to find. A blast of compressed air is good idea.

As to the pauses in #261, it is possible there is some connection. When I was seeing them, autosquaring was not enabled, just ganging.

NeilMarley commented 3 years ago

Hi Terje & Phil, thanks for the consideration you're giving this.... I'll try to answer points relevant to poss isolating a problem ...

I did also make the RESET pin change as I have the Estop/Reset wired-in NC to the lathes existing EStop big-red-button.... that's all working as expected.

The comments on the T41U5XBB_ss_map.h were basically housekeeping as many users will just pick-up the build without reviewing the relevance to their set-up and may reproduce the same problems. I also altered the boardname in mine to T41U5XBB_SS so it's clearer in logs that it's the modified board..

I understand that Autosquare on any axis is logically contingent on the axis being ganged... but the 'If' logic in the T41U5XBB_ss_map.h file doesn't enforce that dependency - it allows Autosquare to be enabled when Ganging isn't and so pin assignments are made and there may be software objects defined with no purpose and possible dead-ends?

I should have mentioned that I homed-in on the possible issue when I investigated further & noticed that during lock-ups I was getting Pn values for Y (limit) in the status reports when I had the lock-up build but knew I had the Y limit connections on the break-out board safely NC/shorted. I guessed these were Y2 triggers which I hadn't expected I had defined :-). With the better behaved rebuild with Y Autosquare off I don't see those Pn:Y reports in the status report. Similarly, I noticed that the console logs from frdfsnlght in #261 had Pn:Z ... hence I suspected a similar/related issue - the controller believes a limit switch has been triggered (for an autosquared but non-ganged axis?) - which processes is that going to spawn and what will be the effect on performance?

In short - I currently suspect this might be a rogue object/process - not an electrical/noise issue - as I said, I could reproduce this on the bench without spindle/stepper currents flowing.

In the interest of trying to definitively tie this down - I'll revert to Builds A and B with the only difference being the Y Autosquare setting & will report back with the outcome. I could also play with shorting/NC the A & B axis limits in each case to see if we can get a correlation.

Neil

terjeio commented 3 years ago

I understand that Autosquare on any axis is logically contingent on the axis being ganged... but the 'If' logic in the T41U5XBB_ss_map.h file doesn't enforce that dependency - it allows Autosquare to be enabled when Ganging isn't and so pin assignments are made...

The #if logic enforces that dependency:

// Changed to use A pins rather than B pins
#if Y_GANGED || Y_AUTO_SQUARE -> translates to if(Y_GANGED or Y_AUTO_SQUARE)
#define Y2_STEP_PIN      (8u)
...

... and there may be software objects defined with no purpose and possible dead-ends?

In short - I currently suspect this might be a rogue object/process - not an electrical/noise issue - as I said, I could reproduce this on the bench without spindle/stepper currents flowing.

There are no running processes waiting on pin changes, they are either polled when required (probe input) or waited on via enabling interrupt handling for them. When waited on then no processor cycles are used until there is a change on the input. Input statuses are also read when a status report is generated (typ. every 200ms). They are read in limitsGetState() in driver.c and only take a couple of processor cycles per pin.

For an input to have any significant impact on performance it has to flipped at such a high frequency that the interrupt handler consumes a sizeable chunk of the available cycles. Limit pin interrupts can be disabled in code by commenting out the lines of code that enables them, for the Y-axis like this in settings_changed (settings_t *settings) in driver.c:

                case Input_LimitY:
                case Input_LimitY_Max:
                    pullup = !settings->limits.disable_pullup.y;
//                    signal->irq_mode = limit_fei.y ? IRQ_Mode_Falling : IRQ_Mode_Rising;
                    break;

Limit pins interrupts can also be disabled by not enabling hard limits with the $21 setting.

In short - I currently suspect this might be a rogue object/process - not an electrical/noise issue - as I said, I could reproduce this on the bench without spindle/stepper currents flowing.

IMO to get to the bottom of this the best approach to start with is to disable relevant interrupts in driver.c to see if the issue goes away and decide next steps depending on the result from that.

If disabling the interrupt for Y limit inputs helps then there is something triggering the Y_Max, and that something should be isolated.

I should have mentioned that I homed-in on the possible issue when I investigated further & noticed that during lock-ups I was getting Pn values for Y (limit) in the status reports when I had the lock-up build but knew I had the Y limit connections on the break-out board safely NC/shorted. I guessed these were Y2 triggers which I hadn't expected I had defined :-)

Appearing randomly? Do you have hard limits enabled ($21)?


It can be mentioned that I once got a batch of PCBs from China where one had a hair thin short under the solder mask - took me a while to figure that out.

NeilMarley commented 3 years ago

Hi Guys,

So, In order to head-off my guilt at raising an ephemeral issue, some Marathon testing today...

  1. I've reverted to the downloaded SS build with the Y Autogang set & the mismatched AuxIn on #35 .... No Problem.
  2. Have then played every permutation of build with Y Autogang on/off & AuxIn changed/not changed.
  3. Have then built every variant of Autogang on/off. AuxIn 'Corrected' and physically connected/disconnected (False/Trued) on each Axis limit switch electrickerally connected/broken...

All with multiple run-throughs of the default Turning macro which is quite script intensive and other long scripts...

Of course (with Audience Effect).... no problems (the Gremlins have Us!).

There is some issue in-there but I have a build which looks great for me to move on & play with spindle-sync etc - I suggest we close this issue & note that something linked to Enumeration/Ganged Axes/Limit Switch handling may be lurking there for future consideration?

Neil

NeilMarley commented 3 years ago

Parked pending more data....