unitreerobotics / unitree_legged_sdk

SDK tools for control robots.
BSD 3-Clause "New" or "Revised" License
291 stars 163 forks source link

Pthread set sched policy failed (Nvidia TX2) #13

Closed itaouil closed 3 years ago

itaouil commented 3 years ago

Hi,

I was trying to run the SDK on the Nvidia TX2, and after having compiled and linked our code with the appropriate arm64 version of the SDK I obtained the following error:

Screenshot 2021-10-27 11:48:10

I run the command to launch our node with sudo as well.

itaouil commented 3 years ago

The error above turns out it has nothing to do with the SDK itself, but with some limitation of the kernel that does not allow you to set desired priorities.

The solution is to run the following command before you run your code if you happen to have a similar issue:

sudo sysctl -w kernel.sched_rt_runtime_us=-1

The answer comes from this link: https://forums.developer.nvidia.com/t/pthread-setschedparam-sched-fifo-fails/64394/4

itaouil commented 3 years ago

I also just noted that just echoing rostopics after having run the command above takes a lot of time.

Doing some research I found the following:

In short, setting sched_rt_runtime_us to -1 can be extremely dangerous. A value of -1 means no limit. In other words, a "run-away" real-time task will be permitted to monopolise a CPU which could (potentially) lock up a system. The default value for sched_rt_runtime_us is 950000 or 0.95 seconds. Broadly speaking, this allows 0.05 seconds to be used by other non-real-time tasks (i.e. SCHED_OTHER) - a small window at which to recover a system.

The kernel parameter namely /proc/sys/kernel/sched_rt_runtime_us can be set between -1 and (INT_MAX - 1).

So the above solution might not be the best...

@Unitree @Zhaiweiwei0 @TrivasZhang did you have similar problems before hand when trying to run the SDK on the NVIDIA TX2?

itaouil commented 3 years ago

Ok.

I ended up resetting the kernel scheduling runtime to the default values as it made the CPU monopolized by the high priority process.

I ultimately ended up commentine/removing the:

InitEnvironment();

that I still had in my code but that is commented out in the example_walk.cpp file and that seems to have fixed the initial scheduling problem for good.