[FR] - Improve Klipper’s Queue System for CNC and Pick-and-Place Machines

naikymen / klipper-for-cnc

Fork of the Klipper 3D-printer firmware, plus features for more general CNC.

https://klipper.discourse.group/t/klipper-for-cnc-initiatives-and-projects-list/5698

GNU General Public License v3.0

87 stars 11 forks source link

[FR] - Improve Klipper’s Queue System for CNC and Pick-and-Place Machines #27

Closed JordyDH closed 3 weeks ago

JordyDH commented 2 months ago

Description:

I would like to propose enhancements to Klipper’s queue system to better support CNC machines, particularly pick-and-place (PnP) machines. These systems require high-speed command execution and precise timing, but the current queue system can introduce delays that affect performance.

Key Issues:

Queue Latency: In PnP machines, delays in command execution can reduce efficiency. Minimizing this latency is essential for high-speed operations.
Real-Time Adjustments: PnP systems often need real-time feedback from vision or sensor systems to adjust commands mid-operation. The current queue system does not efficiently handle real-time feedback, leading to delays or missed adjustments.

I'm not fully sure what the ideal solution would be, so if it is interesting we can brainstorm about some ideas.

naikymen commented 2 months ago

Hi Jordi, this is an interesting point.

Klipper is architectured to introduce (possibly large) latencies between a GCODE command and its execution. This is because the motion planner runs on a "host", which then transmits simple commands (step times) to MCUs. The result is increased stepping speed, brought about by relieving the MCUs from all motion planning duties.

Such is the architectural compromise in Klipper: latency in GCODE execution vs greater stepping speed and timing precision. There is currently no way around this AFAIK.

I'm not sure what you require exactly, but if it is executing GCODE immediately, then Klipper is probably not the best firmware. The same goes for real-time decision making, it is not meant to be fast about it.

I would guess that working around that would involve major refactoring and redesigning of Klipper's code, from host code to MCU code, rivalling the effort required to write your own firmware for your application.

Fast feedback from cameras and decision-making is not a standard CNC thing. I think that your case is in the general robotics domain, which I know little about.

These are just guesses though. You may have better luck posting on the Klipper forum or their Discord channels.

Perhaps with more details on your setup and requirements I could make additional suggestions.

Best! N.

JordyDH commented 2 months ago

Hi @naikymen

You are defending correct that klipper is made to process everything on the host and then send manageable commands to mcu's.

Right now i have read trough most of the code (this was a fun activity...) and i think i have found the biggest bottlenecks.

In my opinion queues are not the problem. They can be useful if used correctly.

Right now I'm testing a patch to reduce the input buffer of the GCODE stream. Standard klipper queues 20commands. So the host thinks it is allready at some location but it isn't. Klipper will wait with giving a response if the queue is full.

This is a problem when you want to sync other actors to the motion of klipper.

The patch introduces a new command in [printer] that can can reconfigure this queue length.

This is just a beginning to make the synchronization of different systems better.

And then at standard klipper has a build in wait of around 100ms before he execute the first command of a move. So next step is to make this also available in de configuration file so you can really tune your setup.

What do you think of this approach?

I have discussed this on the klipper discord and everyone tells me that i need to ga away of klipper. But the thing is the more i test the more i'm sold that klipper can be the next big thing in cnc and even robotics due to its architecture.

But then we need to move away from the recommended raspberry pi and go to something with more processing power.

JordyDH commented 2 months ago

Yesterday we tested the first movements on the Neoden4.

If you are a member of the openpnp discord you can see a video about it

https://discord.com/channels/693329785769689098/693566401461354596/1283095507371294764

We want to make SMT fabrication available to as many people as possible. Okay the guys of opulo.io are also giving it a shot but this is all hyped up and still a flawed design.

I have also other changes planned to do to klipper like:

Analog input that is converted in the host side to anything you want
outputs, toggle the state of a pin or pwn.

Just little thing to make it more sound logical for robotic and cnc purposes.

JordyDH commented 2 months ago

I have also found that you can use M400 to let the queue run out and then get the response.

But this is cheating in my opinion to have it work.

What you can do is give klipper one set of motions up until synchronization is needed and then send a M400.

But I'm not a fan of that approach, it is just a bandaid

naikymen commented 2 months ago

If you are a member of the openpnp discord you can see a video about it

Marvellous! There's always one more exciting open hardware project I don't know about. Thanks for sharing :)

Right now i have read trough most of the code (this was a fun activity...) and i think i have found the biggest bottlenecks.

That's massive work. I really struggle to understand most of it still.

I also got excited about Klipper, and I do agree with the potential of the architecture. My opinion is that one should ditch Klipper as soon as there is a better alternative, be it another firmware or a re-engineered fork to serve a more general purpose.

If you feel capable of accomplishing the latter, then yes, by all means go for it! It would be grand.

However I would still be careful: I have always found one more surprisingly limiting thing about its design when making modifications.

I would ask for a favor: would you consider helping write some docs about your modifications? I really struggled to understand what I could, and was thinking that some documentation will surely help others too.

Analog input that is converted in the host side to anything you want

Have you had a look at the latest bulk sensor code? Its from upstream.

outputs, toggle the state of a pin or pwn.

Do you mean toggling pins as part of the "motion" queue?

I'm not sure about this but there is a "laser mode" which might be inspiring: https://www.klipper3d.org/Using_PWM_Tools.html

Cirromulus worked on that intensively.

naikymen commented 2 months ago

Klipper is always full of hype, but all things that go up must come down. A typical Gartner hype cycle.

There are other relevant projects around, that may of interest.

I really liked the modular things project: https://github.com/modular-things/modular-things

It has also been used to build a CNC machine: https://clank.tools/tools/

Then again, if you manage to remake Klipper to a more general case, then I'd switch over instantly.

JordyDH commented 2 months ago

That's massive work. I really struggle to understand most of it still.

I still have a lot to discover, but I'm getting there step by step, definitely not the best documented open source project for developers.

I also got excited about Klipper, and I do agree with the potential of the architecture. My opinion is that one should ditch Klipper as soon as there is a better alternative, be it another firmware or a re-engineered fork to serve a more general purpose.

I totally agree, but right now I think klipper is one of the fastest growing projects (or there is something else that I don't know). We did some test on our Neoden4 and even with the wrong steppers we got a decent high speed without shaking the machine of the table, something that the stock system did.

However I would still be careful: I have always found one more surprisingly limiting thing about its design when making modifications.

Completely true, I thought that I knew where the main GCODE queue was until I was testing it. But right now I have discovered that Klipper reads GCODES in bulk from a virtual tty like interface, and when this bulk exceeds 20 commands it pauses the read on the file.

Right now I have traced the working of gcode.py and gode_move.py and found some interesting ways to change the behaviour of the ACK and they way klipper executes the buffered commands.

Every time klipper executes a GCODE command, by default it gives an Aknowledge to the host. This is done to buffer commands and make fast, smooth motions. The problem that this gives is that the host doesn't know when klipper has done the movements, or the host needs to send a M400.

This is what I find of the working of RepRap and it also has some problems: https://reprap.org/wiki/GCODE_buffer_multiline_proposal

I think both systems are flawed for fast acting synchronous multi axis machines, but what if we combine the two.

Make the execute queue configurable from the config file.
Last command in the queue doesn't send an ACK up until it has become x command to be executed, best not to send ACK when it is the last one but x+1 command in the queue. And make x also available in the config.
When klipper gets a command, don't wait for the next upcoming one's, just execute immediately.
Introduce a G1 alternative that doesn't send an ACK, but adds a M400 after it in the queue. This can be used for small movements that need to be synchronized with the host. Larger moves that only involves the head to be moved can be done with the new queue system.
Don't use serials ports, use a socket or network connection to stream commands (can be done with socat).

These are in my opinion the biggest changes that need to happen to make it usable for robotic or multi axis machines. I hope that my ideas are clear to understand, and I'm curious what you are thinking of it.

amken3d commented 2 months ago

Klipper is always full of hype, but all things that go up must come down. A typical Gartner hype cycle.

That is so true. Here are my observations as a long term klipper user and contributor

Klipper's original objective i.e Run the Path planning in an application processor to offload the microcontroller from that task, was an interesting concept when the average microcontroller was so much less powerful back in the day. The idea was to be able to generate precise fast step pulses using 8bit or less powerful 32bit microcontrollers while the linux system could do the path planning much faster. However, this premise is no longer true and the use cases we are talking about such as Pick and Place and CNC do not need this approach. There are blazingly fast microcontrollers which can do motion planning and step generation pretty fast (e.g STM32H7 or ESP32P4 or IMXRT106x etc.). There is no real use of the offloaded path planning IMO. If you want to test out this hypothesis all you have to do is run Marlin on a really powerful board such as the BTT Octopus Maz Ez and you will see what I am talking about.
Klipper's use of serial communications between the application processor and the MCU is a bottleneck. Klipper overcomes it somewhat by having a custom protocol in place, which is both a good and a bad. As you go higher in your need for speed you will encounter this limitation.
Klipper is not unique in this approach of application processor + MCU combination. Take a look at linux CNC and the Remora firmware. https://remora-ocean.readthedocs.io/en/latest/GettingStarted.html > This is the same approach, but better ( Linux CNC is the Klippy and Remora's Programmable Realtime Unit is the equivalent of the Klipper MCU code). In there they use SPI which is much more faster than UART or USB (less error prone i.e). The reason why I did not go that route was because of LinuxCNC's lack of support for TMC drivers and a few other considerations. But you can do changes based on config and not have to reflash using this setup equally as well. Also, LinuxCNC uses the PreEmpt RT linux kernel, one with much more control over the timing in your application processor. Some higher end CNCs go a step further and use FPGA based approach, where Linux CNC talks to the FPGA which does all the step pulse generation. Just look for the MESA system.
Klipper's real power is the flash less configuration management, great support for the TMC drivers, the ability to do scripting. These are worth keeping around.
Klipper's use of GPL3.0 licensing. Need i say more.

If I were to be building a new system from ground up, which I am not saying I am, or I am not :), here is how I would approach it. BTW, in Pick and Place and CNC use cases there will always be a Gcode sender system such as OpenPNP, LinuxCNC or some other CAM system, so keeping that in mind

Option a. Application processor running on Preempt-RT if you HAVE to do path planning on the APU Option b. Else use a faster microprocesor to do both Option c . Else use a hardware motion planning system like the TMC4361 and use the microcontroller to do the g code parsing between the Gcode sending system and the hardware motion planner

JordyDH commented 2 months ago

Hi @amken3d definitely some good points, but I don't agree on the first topic.

I run my own embedded engineering firm, and we like to use the STM32G03, this is a M0+ with not a bunch to offer, but with clever work we can achieve a lot. If you want a complex kinematics systems with multiple MCU's you need one master in the system, and here I prefer to use an AMD or Intel processor instead of an MCU or you need to go to a MPU.

Also, UART can be fast if done right, we have systems with 1Mb/s links using the STM32G03. You can also use CAN, and then you can even go way higher with the correct hardware.

I think if we take a look at commercial PNP machines, you would find that their architecture on HW level is more in line in think with klipper. There is no need to place a H7 IC in a PNP head, but with klipper you can use a F103 which is way smaller and more efficient for its use.

And yes, klippers real power comes from the ecosystem of supported hardware and being flexible. You can tinker with your machine and with one file switch you have your stable config when you require it. And this is what I like.

This is more valuable than any other point, I think. Because this can lower the entry point of beginners and motivate others to thinker with their machines. And this is something we are going to try with the Myriad project, where this fork will be one of the building blocks.

I'm an ex essemtec user (never bought, just rented) and believe me there is a big hole between the Chinese and DIY machines and then the professional devices. And I love to make it easy for others to thinker.

So that is why I'm sticking with klipper until something alike comes along that provided as much as all the changes done and more (with a better community).

The GPL3.0 license is not a problem. If you are looking to make something commercial, you can still find a way around it or use a different business model.

And please stay on topic in the [FR], it's about brainstorming how we can improve the queue problem while not having the problems that RepRap and Marlin has.

And I'm definitely curious about your or not your own ground up system ;)

amken3d commented 2 months ago

I am absolutely on topic here @JordyDH . The things I listed are about brainstorming. I just listed several things on why Klipper is not the right system for what you are trying to do. Or rather, what I originally did and you are building on top of . I would care less if you disagree with my point. But please try not to be dismissive of my opinions.

JordyDH commented 2 months ago

@amken3d I don't want to be dismissive, but everyone has their opinion and yours is definitely valid. Internally, we have discussed many times to drop klipper and switch to RepRap.

But I'm just too stubborn to give it a try, so maybe I'm driving straight to a concrete wall, or maybe with luck we have something that can work.

But if there was a system designed for this purpose with the flexibility of klipper I would change immediately.

naikymen commented 2 months ago

Here are my observations as a long term klipper user and contributor

@amken3d nice list! I sympathise and agree.

I am currently enjoying my 6-axis machine running off a pair of Arduino UNOs, which I think is still great because its cheap and ubiquitous. And other people are enjoying the same machine run by a smooth Duet2 board. Need more tools or axes? No problem, just plugin another UNO and you're done. I just love it.

@JordyDH I confess that I'm a bit lost on the requirements. If I understood correctly, you want klippy (the python program, not klipper) to:

Report on the "status" of the GCODE commands it received (from some external program), for example: move in GCODE queue, move in motion planning queue, move sent to MCU through serial, move executing, move executed.
Greatly reduce the latency between a GCODE command being received, and it being executed by the MCU.
Do this by adapting the python-side only (i.e. klippy).

All of this is important to coordinate the tool-head with a camera taking pictures of an electronic component, adjusting offsets, and then resuming motion. All of it as fast as possible.

Correct?

JordyDH commented 2 months ago

I don't understand the first one, I want to limit the amount of commands that are processed while moving to a set variable in the printer.cfg, when this limit is reached klippy delays it ACK ('ok') when the last command is in the x place.

There are some delays placed in the code, these can be removed where needed, this will also result in higher CPU usage, but it is a tradeoff we can take (or we need to find another way to move faster through the chain)

I want to do this on the klippy side.

All of this is important to coordinate the tool-head with a camera taking pictures of an electronic component, adjusting offsets, and then resuming motion. All of it as fast as possible.

Yes, and that is why we can introduce a modified "G1" like command that disables the need_ack argument and adds a M400 at the end of it. The host will then receive the ACK when the single move is done.

This is just a first approach that I'm thinking of with small changes that we can solve a part of the problem.

JordyDH commented 2 months ago

The main difference between klipper and reprap and marlin is that klipper gives an ACK when the command is processed, but movement is not started, and the others give an ACK when the move is (“almost”) done.

amken3d commented 2 months ago

Bottomline, My Opinion is

It is not a 100% worth trying to improve "Klipper" to do solve queue planning problem for non 3d printer use cases. There is a lot of baggage with Klipper that makes true improvement somewhat limiting. By changing Klippy (Host) completely, you are totally deviating from the core spirit of Klipper. At that point you might just start fresh. Use the lessons learned, but start fresh.
As a learning exercise, maybe it is a good idea to solve the queuing problem that you can then apply elsewhere
Before you hitting the wall, I hit the wall for several months before I could figure out how to make it functional. So I understand the attitude and the frustration. While it is the right thing to do, keep in mind that Klipper mainline does not really benefit from any work that we do here. If it were that Klipper would immensely benefit from improving the queuing problem, several others from the main Klipper community would gladly get behind the effort with all their resources. But this usecase is not what the Klipper community will get behind. The people on this thread and maybe a few others might jump in. Primarily because OpenPNP (and CNC) community will take a long time to adopt Klipper, if they ever do. Klipper has a reputation, both good and bad. This is the last I will mention these. Use these as you see fit.

Now lets get to brainstorming on actually solving our queuing problem without changing the hardware architecture as it seems to be your intent. You have already identified several things. Here is what I am thinking

You could start by creating some of your own commands as macros before you start changing Klippy's behavior. See my Machine.xml and my config file on my repo, you will see that I have some custom commands. I dont use G codes in OpenPNP, instead I tell OpenPNP to send the custom commands.
Some of the changes you are desiring will need some changes in Klipper (the microcontroller side) as well. I am thinking in Pulse_counter.c, Sched.c and stepper.c
Split the process into two threads. One that runs the super loop and one that runs commands that interrupt the super loop.

Due to the limitations that I have already described , which you considered as opinion ;), you will never have a true real time system with Klipper. All you can do is try to reduce the latency by the tweaks and be ready for extensive testing. I am sure you have realized it as well by now.

naikymen commented 2 months ago

Yes, and that is why we can introduce a modified "G1" like command that disables the need_ack argument and adds a M400 at the end of it. The host will then receive the ACK when the single move is done.

Ohh I see now! You're not using the socket (UDS) connection to klippy at all. I was quite lost.

This is just a first approach that I'm thinking of with small changes that we can solve a part of the problem.

I imagine that it could work.

Would it help if the M400 command sent an ack message? In that way you could know exactly when the move queue has ended. Perhaps it already does.

Edit: ok now I see what you meant by bandaid. Are you sure that using M400 is not enough for your application?

I'm not sure how much I can help here; I don't know anything about OpenPnP. :/

naikymen commented 2 months ago

As far as I recall, GRBL does not send an "ok" message when the machine stops moving either.

It sends an "ok" when the message enters the motion planner, and the recommended solution is using G4 P0 and waiting for the message (which is equivalent to M400 AFAIK).

See: https://github.com/grbl/grbl/issues/975

naikymen commented 2 months ago

What I was thinking is similar to the suggestions in that thread.

Because Klipper is deterministic with its timing, it would be possible to setup timers that trigger when the time comes for the "end" of the motion associated to a particular GCODE command. The triggers can then run callback functions, which send "done" messages (possibly with some identifier, like GCODE line, o something more elaborate).

This would require messing a bit with the move queue, but only to add reports, and keep track of the boundaries between GCODE commands. I imagine that this solution requires in depth knowledge of how the trapezoids are stitched in the queue (see Move and LookAheadQueue in toolhead.py).

naikymen commented 1 month ago

Hi again! how is this coming along?

naikymen commented 3 weeks ago

I'll close this for now until it is active again. Feel free to comment/reopen/etc. :)