pantor / ruckig

Motion Generation for Robots and Machines. Real-time. Jerk-constrained. Time-optimal.
https://ruckig.com
MIT License
689 stars 162 forks source link

Benchmark #61

Closed stefanbesler closed 3 years ago

stefanbesler commented 3 years ago

Hey @pantor

Sorry to bother you with that … I am just implementing a benchmark for my port of rucking to TwinCAT / Codesys. At the moment the performance of the port is slower by a factor of 4, which is the order of magnitude of “worse performance” that I expected (actually I thought it is about a factor 10, haha). Since this is before doing any performance optimization in the port, I am pretty happy with this … I am using the stack way to much right now, which in TwinCAT is pretty bad usually.

Anyway, at the moment I am benchmarking on a virtual machine, which of course is not great for your implementation nor the port. However, since I have core isolation in the VM the jitter I get for the port is still okayish (worst duration for 7 DoF is around 600+/-200us). Because of my setup with the VM the C++ implemention is doing much, much worse in regard to worst-mean duration right now (talking about 15ms +/- 10ms).

In your opinion, is it enough to compare the implementations on a physical pc? Did you run the benchmark on a Standard PC with vanilla Linux/Windows or did you use any kind of RT patch, process priorization, core Isolation (…) to reduce jitter?

pantor commented 3 years ago

Hi @stefanbesler, btw great to see so much progress at struckig!

These are very good questions and I'm not sure if I'm doing everything correctly regarding benchmarking. So the benchmarks in the paper and in the Readme were measured on a Linux with RT patch, however without process prioritization or core isolation. Also, these were cold start benchmarks, so usually the first calculation was the worst / slowest one. This was done on a Standard PC, the exact CPU is given in the benchmark figure.

Btw, I've done quite a few optimizations with the latest commits. With the recent master branch, the benchmark for 7 DoFs typically outputs on my hardware:

# Cold start calculation
./otg-benchmark
# Average Calculation Duration 11.5343 pm 0.270497 [µs]
# Worst Calculation Duration 110.556 pm 79.7809 [µs]

nice -n -20 ./otg-benchmark
# Average Calculation Duration 11.3142 pm 0.0460994 [µs]
# Worst Calculation Duration 80.0998 pm 38.353 [µs]

# With single warm-up calculation
./otg-benchmark
# Average Calculation Duration 11.2291 pm 0.0219179 [µs]
# Worst Calculation Duration 82.315 pm 34.3213 [µs]

nice -n -20 ./otg-benchmark
# Average Calculation Duration 11.4594 pm 0.061064 [µs]
# Worst Calculation Duration 59.8226 pm 10.0279 [µs]

So the worst calculation duration decreases as expected. Anyway, I think the first golden rule of benchmarking is that it should always be tested on the system of interest itself. A VM sounds very bad for worst-time calculation, so that doesn't surprise me.