Reliable USB comm? - Githubissues

phil-barrett commented 3 years ago

Yes, I am having another wild hair moment...

One of the things that I see a lot of in various CNC forums are people complaining about EMI/Noise. In many cases it comes in on the USB cable. Just today, I was reminded of that while testing my new Pro breakout board. Multiple crashes and alarms until I routed the USB cable away from the stepper motor - and then none after that. What I saw was error messages like "$ not valid" or some such. Signifying that garbage input was probably received.

Perhaps an edge to grblHAL would be a packetized com protocol with error checking and resend capability? There is no substitute to getting machine electronics laid out correctly with proper shielding and such but if a hardened protocol reduced some of the errors, it could be a big selling point for grblHAL. Why go with X or Y when grblHAL has that? Obviously, it would be optional.

I am sure the benefit can be debated.

HuubBuis commented 3 years ago

A trouble free connection is a must for safe CNC operations. WiFi, Bluetooth and USB use protocols that should take care of this. The serial over USB connection, grbl uses, is however is not that robust. Changing it would also need a change at the sender side. I think adding USB support using HID would be easier to implement.

if a hardened protocol reduced some of the errors, it could be a big selling point for grblHAL

I aggree

phil-barrett commented 3 years ago

a change at the sender side

Fortunately we know a couple of them...

terjeio commented 3 years ago

Perhaps an edge to grblHAL would be a packetized com protocol with error checking and resend capability?

SLIP?

I saw a port of grbl recently with checksum added (to every block/line?) Not sure about resend capability - that is a tricky one to implement.

@HuubBuis Bluetooth is reliable in a noisy environment?

phil-barrett commented 3 years ago

Yeah, resend has its issues. A long time ago, I implemented an out of order resend scheme for streaming media (based on UDP). You need to have a sequence number and queue the packets up as they are received. The front end process checks the validity of a packet an enqueues it if valid, tosses it if not. The queue manager checks the queued packet and if there is a missing packet issues a resend request. Make the queue depth tunable (set it deep enough to handle a couple of resends) and delay start of actual operations until the queue exceeds a defined min queue length. Signal error if more than a certain number bad packets are received within a window (that is related to min queue length)

SLIP?

Now that goes back a long way. I dunno, TCP/IP might be over kill here.

HuubBuis commented 3 years ago

@HuubBuis Bluetooth is reliable in a noisy environment?

My small shop isn't that noisy compared to a production site. I use WiFi (ESP8266) for the lathes for galvanic isolation and to control (develop & debug) any lathe from any PC (Tablet). My rotary table is connected using Bluetooth (HC05) for the same reason. I do this for several years now and both work without any problem. The communication protocol behind these connections is responsible for a "reliable" connection. I notice once in a while a very short delay. I am not sure but I suspect that the Windows update on my low power tablets (Atoms) is the cause of this. I think that serial over USB, in general, is "less" reliable than WiFi or Bluetooth. Most grbl users have a USB connection so it can't be so bad to use. If there are problems, they are hard to solve for most users.

WiFi doesn't require an internet connection, just a local WiFi network. You can use an ESP32 to setup one!

terjeio commented 3 years ago

IMO the ultimate connection is over cabled ethernet with a dedicated network card for the controller. I use that for my SmoothStepper based router.

USB problems can be cured with galvanic isolation? Adapters are available.

Another option is to add a FTDI chip to the board and isolation on the UART side, I did this for my CO2 laser board by using a ADUM1402 chip for isolation.

Writing a plugin for packet transfer over USB is possible. There is no need for the core to know anything at all about the transfer protocol - serial, SD card transfer, Telnet and WebSocket protocols are all transparent to the core. Who would pick up the challenge to "get it done"?

HuubBuis commented 3 years ago

IMO the ultimate connection is over cabled ethernet with a dedicated network card for the controller. I use that for my SmoothStepper based router.

I agree

Writing a plugin for packet transfer over USB is possible.

That also requires changes at the GUI (gcode sender) side.

terjeio commented 3 years ago

That also requires changes at the GUI (gcode sender) side.

Not neccesary? - TCP/IP over USB is possible:

https://www.avrfreaks.net/comment/1565496#comment-1565496

but no plugin then since this is for a distinct CDC class as I understand it - not VCOM. If lwIP can be put on top of that then the existing Telnet and WebSocket protocol implementations can be used as-is.

Sender side changes are possible, for that broad acceptance of any custom protocol is required...

langwadt commented 3 years ago

I don't know how easy it is to do on windows, but on linux you could make a program that pretends to be a serialport using the current protocol on one side and talks to the hardware via a different protocol on the other side

asteppke commented 3 years ago

While a physically reliable connection cannot be substituted by any code, there is still something that entails only a small change. What grblHAL here could do though is a simple check to discard invalid commands and indicate/request a retransmission (even stopping all movement would be better than continuing with potentially wrong coordinates). That would also be a safety feature as transmission errors cannot lead to erroneous movement.

There are many approaches to this and there is a already a somewhat established standard: G-code checksums. That would not entail inventing a completely new communication protocol or transplanting a heavy-weight library.

In pseudo-code a simple XOR of all characters for a given G-code line before the checksum marker (*):

byte checksum = 0;         
byte count = 0;         
while(instruction[count] != '*')                 
    checksum = checksum^instruction[count++];

terjeio commented 3 years ago

G-code checksums could be easy to implement, if a check fails then just issue an error and stop execution. A potential checksum marker should not be a printable character, it would be better to either use a control character or use the last few bytes before the line terminator.

Requesting retransmission is not unless ditching "agressive buffering" at the cost of lower throughput. Without "agressive buffering" the "ok"/"error" protocol could be replaced with "ok"/"nack"/"error" with "nack" beeing a request for retransmission. Such a protocol extension should be added at the driver level as there should be no need(?) for it when a robust protocol such as tcp/ip is used for the connection.

Many 32-bit processors has a CRC peripheral that could be utilized, I plan to do that for two-byte NVS storage checksums - the single byte checksum used today is not very good.

phil-barrett commented 3 years ago

Totally agree that a hardware solution (ethernet, for example) is the best way to go. The problem with the simple ack/nak-retransmit model is that you insert an extra action into every block send (block == 1 line of GCode, unfortunate terminology in this context)) and slows down the process. A buffering model where larger amounts of data can be sent per ack/nak cycle would help but that is a bigger change. In that case, better to do some form of tcp/ip over USB.

Another approach is to borrow from streaming media. Out of sequence retransmit. grblHAL would only send nak for missing packets, the sender retransmits and grblHAL assemble in correct order. But that is a bigger change yet.

Yet another approach would be to use the simple ack/nak/retransmit model proposed and buffer much larger amounts of data to ensure the queue never runs dry. [edit] a variant of this is to send the whole GCode file and execute out of that [/edit]

In any of these cases, the simple direct serial input model still needs to be supported since talking to a grbl board via a serial terminal program is a useful diagnostic tool.

asteppke commented 3 years ago

@terjeio, @phil-barrett: Fully agree on the technical points. The approach of using a CRC over a larger buffer (several lines of G-code) with a single "ok/nack/error" is more in line with TCP/IP and similar protocols.

The G-code checksum and halt-on-error would probably cover all cases where the transmission channel corrupts or loses a transmission but works fine almost all the time. Like an emergency switch this is not something for any regular occurrence but might save the machine or part.

If transmission errors occur regularly and high throughput is required it becomes more involved and all the lessons from other protocols over lossy channels apply.

phil-barrett commented 3 years ago

The G-code checksum and halt-on-error would probably cover all cases where the transmission channel corrupts or loses a transmission but works fine almost all the time. Like an emergency switch this is not something for any regular occurrence but might save the machine or part.

In this case, halt is only slightly better than doing nothing - the job is likely lost. Much better would be to allow continuation of the job with notification of the issue. This would make grblHAL a preferred solution.

If transmission errors occur regularly and high throughput is required it becomes more involved and all the lessons from other protocols over lossy channels apply.

In that case, my advice would be to fix the underlying problem.

terjeio commented 3 years ago

If more than one block/line is to be covered by a checksum then a checksum marker has to be added. A second input buffer has to be added for both single or multiple block checksums or the core has to handle the protocol. Letting the core handle it does not appeal to me, in principle it does not need to know anything about how data arrives.

Handling a checksum protocol in the core may interfere with the new tool change protocol too, this suspends the current input buffer on a M6 and redirects further input to a second one until the tool change is complete. I do not want the core to handle that either...

A plugin that inserts itself between the serial/USB input buffer and the core would be the ideal solution as this can then be shared by all drivers (capable of handling this).

HuubBuis commented 3 years ago

If more than one block/line is to be covered by a checksum

If multiple lines are covered by one checksum, these lines can't be sent to the grbl buffer until the checksum is received and approved, otherwise a bad line could still get into the buffer.

terjeio / grblHAL

Reliable USB comm? #118