siemens / ros-sharp

ROS# is a set of open source software libraries and tools in C# for communicating with ROS from .NET applications, in particular Unity3D
Apache License 2.0
958 stars 366 forks source link

ROS# silently drops sensor_msgs/JointState messages including NaN values #334

Closed shoeffner closed 4 years ago

shoeffner commented 4 years ago

I found a bug!

Here is my bug description: sensor_msgs/JointState messages are dropped/ignored silently when they contain NaN values. Possibly happens to other messages as well.

For the JointState this is a low-priority issue, as I built a helper node to drop nan-values before they get to Unity.

Perform the following steps reproduce the bug:

If you have a NAO available:

If you don't have a NAO available:

Observed results:

Only the odom messages are transferred correctly (the NAO changes its base_link properly), but other joints are not correct.

Expected results:

The NAO should move properly, or there should be a warning that the messages containing "nan" values are dropped.

I found out that the issue that some values reported from the naoqi_driver appear to be NaN, especially in the position and velocity arrays. If I send a custom message via rostopic pub, the robot behaves correctly. As a fix, I built a ROS node which drops NaN parts of JointState messages and republishes them. This fixes it and the NAO behaves correctly. Also, the dropping must already occur somewhere in the RosBridgeClient, as the JointStateSubscriber never receives those message (ReceiveMessage is never called).

Relevant Code:

This node subscribes to the /joint_states from naoqi, drops all nan-values and publishes the resulting messages on /joint_states/no_nan:

#!/usr/bin/env python
from functools import partial
import numpy as np

import rospy
from sensor_msgs.msg import JointState

def gather(values, mask):
    for value, valid in zip(values, mask):
        if valid:
            yield value

def bridge(publisher, message):
    vals = np.vstack((message.position,
                      message.velocity,
                      message.effort))
    # find NaN-columns
    nan_mask = np.isnan(vals)
    # collapse to one row which are not nan
    valid = ~np.any(nan_mask, axis=0)

    message.name = list(gather(message.name, valid))
    message.position = list(gather(message.position, valid))
    message.velocity = list(gather(message.velocity, valid))
    message.effort = list(gather(message.effort, valid))

    rospy.loginfo("%s", message)
    publisher.publish(message)

def main():
    publisher = rospy.Publisher('joint_states/no_nan', JointState, queue_size=10)
    rospy.Subscriber('joint_states', JointState, partial(bridge, publisher))
    rospy.init_node('fix_joint_states', anonymous=True)
    rospy.spin()

if __name__ == '__main__':
    try:
        main()
    except rospy.ROSInterruptException:
        pass

I am running the following stack -- if you need more details, you can check out my repository (https://github.com/shoeffner/ease_naoqi_dlu). In commit https://github.com/shoeffner/ease_naoqi_dlu/commit/57b62e46447e1ae939617d32b5de4dc954a98f68 the issue exists, by commit https://github.com/shoeffner/ease_naoqi_dlu/commit/d3a2ada4c868ed88df50c123d6634d4312f15425 it's fixed – the following commit updates the version of ros-sharp, which has the same issue.

version: "3.7"

services:
    roscore:
        image: osrf/ros:melodic-desktop-full
        command: roscore

    naoqi_driver:
        build:
            context: .
            dockerfile: docker/Dockerfile.naoqi
        image: ease:naoqi
        command: /start_after_rosmaster.sh roslaunch naoqi_driver naoqi_driver.launch nao_ip:=${NAO_IP} roscore_ip:=roscore network_interface:=eth0
        depends_on:
            - roscore

    naoqi:
        build: .
        image: ease:naoqi_dlu
        command: /start_after_rosmaster.sh rosrun ease_dlu_util fix_joint_states.py
        depends_on:
            - naoqi_driver

    rosbridge:
        build: .
        image: ease:naoqi_dlu
        command: /start_after_rosmaster.sh roslaunch file_server ros_sharp_communication.launch
        ports:
            - "9090:9090"
        depends_on:
            - roscore
berkayalpcakal commented 4 years ago

Hello @shoeffner,

Please go over the issue #78, where the limitation of de-/serialization of NaN values has already been discussed. You can particularly try this suggestion regarding the use of nullable class members to be able to bring messages with null values into Unity (because rosbridge convertes NaN values to null values while serializing the message since this commit which are then not deserialized to non-nullable class members causing the callback function of the subscriber to be never called).

I hope this helps. Any improvement suggestions are welcomed regarding the limitation of de-/serialization of NaN values.

shoeffner commented 4 years ago

Thanks, I didn't find #78. Indeed, a warning would have been nice (maybe even a one off thing) explaining that it is application dependent how to handle nan values and that the values are dropped silently. The problem I had was basically that I didn't see the issue nor is there a way to trivially solve it on the unity side.

If course, it could be some setting somewhere how to handle nan values, but I think the warning might be the simplest solution.

On Tue, Aug 11, 2020, 18:04 Berkay Alp Cakal notifications@github.com wrote:

Hello @shoeffner https://github.com/shoeffner,

Please go over the issue #78 https://github.com/siemens/ros-sharp/issues/78, where the limitation of de-/serialization of NaN values has already been discussed. You can particularly try this suggestion https://github.com/siemens/ros-sharp/issues/78#issuecomment-457105744 regarding the use of nullable class members to be able to bring messages with null values into Unity (because rosbridge convertes NaN values to null values while serializing the message since this commit https://github.com/RobotWebTools/rosbridge_suite/commit/44495be0956dd6c6202b959bc97935715b4a0917#diff-8844e002e306884d690c34b324d23326 which are then not deserialized to non-nullable class members causing the callback function of the subscriber to be never called).

I hope this helps. Any improvement suggestions are welcomed regarding the limitation of de-/serialization of NaN values.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/siemens/ros-sharp/issues/334#issuecomment-672052278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOAODY5RRIJWJX77NMXIPLSAFTZRANCNFSM4P3BH5MA .

MartinBischoff commented 4 years ago

Hi @shoeffner .

We thought about how to handle NaN and Null values several times. Up to now I did not find a nice general answer:

Serializers can handle Null values differently. You could for example also implement an ISerializer that fills Null values with default values.

For the ´MicrosoftSerializer` for example you can adjust Null value handling in this Method.

shoeffner commented 4 years ago

Thanks @MartinBischoff, I also handle NaN in ROS at the moment, but implementing a custom serializer might also be a good idea in the future. Thank you!