raisimTech / raisimLib

Visit www.raisim.com
http://www.raisim.com
Other
341 stars 91 forks source link

Deadlock in "World" class #355

Open Luocheng-Zheng opened 2 years ago

Luocheng-Zheng commented 2 years ago

Hello. Recently I am using raisimgym to train an RL policy. However, after several iterations, it got stuck. As the program uses parallel programming, the issue is like a deadlock: it stopped running forever, cannot be terminated by ctrl+c and CPU usage is low. I find out it was stuck in the line world->integrate();.

        mano_r_->setPdTarget(pTarget_clipped_r, vTarget_r_);
        /// Apply N control steps
        for (int i = 0; i < int(control_dt_ / simulation_dt_ + 1e-10); i++){
            if(server_) server_->lockVisualizationServerMutex();
            world_->integrate();
            if(server_) server_->unlockVisualizationServerMutex();
        }

Therefore, I wonder if there are any mutex or any other kind of locks in "World" class. Thank you in advance!

jhwangbo commented 2 years ago

Please do not post an image of code. Other people cannot search it.

Are you sure it's not stuck and the next line? integrate has no locking mechanism. Can you comment the line above and below and check if it gets stuck?

Luocheng-Zheng commented 2 years ago

Already tried. I am pretty sure that it was stuck in world_->integrate();

jhwangbo commented 2 years ago

can you create a minimum code to reproduce the issue? Because there is no locking function anywhere in the code, I don't know how I can help

Luocheng-Zheng commented 2 years ago

Thank you. Now at least I know that World does not produce that issue itself. I'll try to reproduce it.

Luocheng-Zheng commented 2 years ago

It turns out that my urdf model has inconsistent mass and inertia matrix, which may cause some problems in integrate.

fanshi14 commented 2 years ago

cannot be terminated by ctrl+c and CPU usage is low

I have the very similar case for a long time, but do not find the reason. What do you mean that your urdf model has inconsistent mass and inertia matrix, could you share your urdf or minimal example of this kind of urdf? @Luocheng-Zheng

Luocheng-Zheng commented 2 years ago

@fanshi14 It is an articulated object, which has two bodies: lid and bottle. The wrong urdf accidentally set the bottle inertia the same as the lid, but kept the bottle mass. And it also had wrong rpy for <origin>. When you come accross similar problems about world->integrate(), do double check your urdf files.

fanshi14 commented 2 years ago

Thanks for the information. @Luocheng-Zheng

I had the virtual links without any mass and inertia in my urdf; then I add a tiny ignorable mass/inertia value, the deadlock problem is less likely to happen (not 100% sure solved). Do you think it is a potential problem in raisim? @jhwangbo

jhwangbo commented 2 years ago

The zero-mass link is ok as long as it does not result in infinite acceleration. In this case, it will segfault or so. But I am not sure when it can get stuck. If you can provide a runnable code and the URDF, I can try to figure out the issue.

ruanruan0313 commented 10 months ago

@fanshi14 It is an articulated object, which has two bodies: lid and bottle. The wrong urdf accidentally set the bottle inertia the same as the lid, but kept the bottle mass. And it also had wrong rpy for <origin>. When you come accross similar problems about world->integrate(), do double check your urdf files.

Hello, I would like to ask what to do if the replacement urdf file does not display the full model. thank you.