robotology / gz-sim-yarp-plugins

YARP plugins for Modern Gazebo (gz-sim).
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Modify sensor plugins to avoid use of non-deterministic functionality to access the data #71

Open traversaro opened 5 months ago

traversaro commented 5 months ago

We are experiencing many flaky errors in gz-sim-yarp-plugins tests:

To be honest, I did not looked a lot into those back in time when I opened the issue. However, as @xela-95 started working in the repo, I noticed that most sensor plugins are actually using the gz's gz-transport (i.e. ZeroMQ + ProtoBuf) functionality to actually access sensors. To give an example, let's look at the forcetorque plugin. Before the first step of simulation, a connection via gz-transport is established in the main physics thread of gz-sim by the following snippet of code (from https://github.com/robotology/gz-sim-yarp-plugins/blob/25216f1c90ed524a30b664e7cc327b368eb2d649/plugins/forcetorque/ForceTorque.cc#L139):

        if (!this->ftInitialized && _ecm.ComponentData<components::SensorTopic>(sensor).has_value())
        {
            this->ftInitialized = true;
            auto ftTopicName = _ecm.ComponentData<components::SensorTopic>(sensor).value();
            this->node.Subscribe(ftTopicName, &ForceTorque::ftCb, this);
        }

This will call the ForceTorque::ftCb callback once a message is received, that will actually just do:

    void ftCb(const gz::msgs::Wrench& _msg)
    {
        std::lock_guard<std::mutex> lock(this->ftMsgMutex);
        ftMsg = _msg;
    }

However, this will happen (from what I understand) in a thread different from the main physics thread.

Then, after the update of the physics, the following code again in the main physics thread reads the ft values (that are just ~6 doubles) from the buffer that was populated by ftCb :

        gz::msgs::Wrench ftMsg;
        {
            std::lock_guard<std::mutex> lock(this->ftMsgMutex);
            ftMsg = this->ftMsg;
        }

        std::lock_guard<std::mutex> lock(forceTorqueData.m_mutex);
        forceTorqueData.m_data[0] = (ftMsg.force().x() != 0) ? ftMsg.force().x() : 0;
        forceTorqueData.m_data[1] = (ftMsg.force().y() != 0) ? ftMsg.force().y() : 0;
        forceTorqueData.m_data[2] = (ftMsg.force().z() != 0) ? ftMsg.force().z() : 0;
        forceTorqueData.m_data[3] = (ftMsg.torque().x() != 0) ? ftMsg.torque().x() : 0;
        forceTorqueData.m_data[4] = (ftMsg.torque().y() != 0) ? ftMsg.torque().y() : 0;
        forceTorqueData.m_data[5] = (ftMsg.torque().z() != 0) ? ftMsg.torque().z() : 0;
        forceTorqueData.simTime = _info.simTime.count() / 1e9;

What is the problem here? The main problem for our use cases that after we after we run a given step N of simulation, we do not have any idea if forceTorqueData.m_data actually contains the FT readings of step N, or of some other step N-m . The actual value contained in forceTorqueData.m_data depends on the operating system scheduled, and weather it actually scheduled the thread that is calling ftCb, so in general it is something non deterministic.

This is effectively a regression w.r.t. to gazebo-yarp-plugins (the yarp plugins for Gazebo Classic), so ideally we would like to try to fix this somehow, at least for sensors whose simulations is lightweight (for example forcetorque and imu, for camera and laser the situation may be more complex as the rendering itself could occur in a thread different from the main physics thread.

First of all, we should understand if this is possible at all in Gazebo Harmonic, and if it is not we should explore how to modify Gazebo libraries upstream to permit this.

traversaro commented 5 months ago

Interestingly, the non-deterministic reading of sensors described in this issue is actually documented in https://gazebosim.org/api/gazebo/7.0/ardupilot.html , so it would be interesting to understand if it can be avoided.

Other related issues:

traversaro commented 5 months ago

Indeed, I guess that https://github.com/gazebosim/gz-sim/issues/2268 is a prerequisite for this.