Modify sensor plugins to avoid use of non-deterministic functionality to access the data

traversaro commented 10 months ago

We are experiencing many flaky errors in gz-sim-yarp-plugins tests:

To be honest, I did not looked a lot into those back in time when I opened the issue. However, as @xela-95 started working in the repo, I noticed that most sensor plugins are actually using the gz's gz-transport (i.e. ZeroMQ + ProtoBuf) functionality to actually access sensors. To give an example, let's look at the forcetorque plugin. Before the first step of simulation, a connection via gz-transport is established in the main physics thread of gz-sim by the following snippet of code (from https://github.com/robotology/gz-sim-yarp-plugins/blob/25216f1c90ed524a30b664e7cc327b368eb2d649/plugins/forcetorque/ForceTorque.cc#L139):

        if (!this->ftInitialized && _ecm.ComponentData<components::SensorTopic>(sensor).has_value())
        {
            this->ftInitialized = true;
            auto ftTopicName = _ecm.ComponentData<components::SensorTopic>(sensor).value();
            this->node.Subscribe(ftTopicName, &ForceTorque::ftCb, this);
        }

This will call the ForceTorque::ftCb callback once a message is received, that will actually just do:

    void ftCb(const gz::msgs::Wrench& _msg)
    {
        std::lock_guard<std::mutex> lock(this->ftMsgMutex);
        ftMsg = _msg;
    }

However, this will happen (from what I understand) in a thread different from the main physics thread.

Then, after the update of the physics, the following code again in the main physics thread reads the ft values (that are just ~6 doubles) from the buffer that was populated by ftCb :

        gz::msgs::Wrench ftMsg;
        {
            std::lock_guard<std::mutex> lock(this->ftMsgMutex);
            ftMsg = this->ftMsg;
        }

        std::lock_guard<std::mutex> lock(forceTorqueData.m_mutex);
        forceTorqueData.m_data[0] = (ftMsg.force().x() != 0) ? ftMsg.force().x() : 0;
        forceTorqueData.m_data[1] = (ftMsg.force().y() != 0) ? ftMsg.force().y() : 0;
        forceTorqueData.m_data[2] = (ftMsg.force().z() != 0) ? ftMsg.force().z() : 0;
        forceTorqueData.m_data[3] = (ftMsg.torque().x() != 0) ? ftMsg.torque().x() : 0;
        forceTorqueData.m_data[4] = (ftMsg.torque().y() != 0) ? ftMsg.torque().y() : 0;
        forceTorqueData.m_data[5] = (ftMsg.torque().z() != 0) ? ftMsg.torque().z() : 0;
        forceTorqueData.simTime = _info.simTime.count() / 1e9;

What is the problem here? The main problem for our use cases that after we after we run a given step N of simulation, we do not have any idea if forceTorqueData.m_data actually contains the FT readings of step N, or of some other step N-m . The actual value contained in forceTorqueData.m_data depends on the operating system scheduled, and weather it actually scheduled the thread that is calling ftCb, so in general it is something non deterministic.

This is effectively a regression w.r.t. to gazebo-yarp-plugins (the yarp plugins for Gazebo Classic), so ideally we would like to try to fix this somehow, at least for sensors whose simulations is lightweight (for example forcetorque and imu, for camera and laser the situation may be more complex as the rendering itself could occur in a thread different from the main physics thread.

First of all, we should understand if this is possible at all in Gazebo Harmonic, and if it is not we should explore how to modify Gazebo libraries upstream to permit this.

traversaro commented 10 months ago

Interestingly, the non-deterministic reading of sensors described in this issue is actually documented in https://gazebosim.org/api/gazebo/7.0/ardupilot.html , so it would be interesting to understand if it can be avoided.

Other related issues:

Interesting related issues in gz-sim: https://github.com/gazebosim/gz-sim/issues/2268
In gazebo-yarp-plugins, we had a similar issue: we were using a non-deterministic way to control joints, and then we switched to a deterministic way, see https://github.com/robotology/gazebo-yarp-plugins/issues/165 for the issue and https://github.com/robotology/gazebo-yarp-plugins/pull/324 for how we solved the problem.

traversaro commented 10 months ago

Indeed, I guess that https://github.com/gazebosim/gz-sim/issues/2268 is a prerequisite for this.

traversaro commented 4 months ago

For what regards FT sensors, we could use https://github.com/gazebosim/gz-sensors/pull/449 on the gz-sensors/gz-sim versions that will support it.

robotology / gz-sim-yarp-plugins

Modify sensor plugins to avoid use of non-deterministic functionality to access the data #71