v-kiniv / rws

WebSocket gateway for ROS2 topics and services
Apache License 2.0
8 stars 4 forks source link

Humble service calls don't return data #10

Closed MoffKalast closed 7 months ago

MoffKalast commented 7 months ago

Alright so I'm back at this and I've finally figured out what's going wrong with service calls on humble.

Here's a simple rclpy service I've tested with that prints that it's been called and returns a test string.

    self.create_service(Trigger, '/test', self.test)

    def test(self, req, res):
        self.get_logger().info("Test service called!")
        res.success = True
        res.message = "Service handler online."
        return res

Called from the terminal it returns all as it should:

ros2 service call /test std_srvs/srv/Trigger {}
requester: making request: std_srvs.srv.Trigger_Request()

response:
std_srvs.srv.Trigger_Response(success=True, message='Service handler online.')

And the service reports being called:

[service_handler_node-4] [INFO] [1705673112.498633772] [service_handler_node]: Test service called!

Calling it from Foxglove however, yields a {}, which I suspect is a client side fix since calling it manually from roslibjs returns undefined.

What's interesting though is that the services genuinely get called on the ROS side:

[service_handler_node-4] [INFO] [1705673112.498633772] [service_handler_node]: Test service called!
[service_handler_node-4] [INFO] [1705673133.305536556] [service_handler_node]: Test service called!
[service_handler_node-4] [INFO] [1705673333.699421237] [service_handler_node]: Test service called!
[service_handler_node-4] [INFO] [1705673338.410078049] [service_handler_node]: Test service called!

The response just doesn't make it back. So there's some issue in parsing that part. In case it helps tracking the issue down, the errors I'm getting are:

rmw_serialize: invalid data size, at ./src/rmw_node.cpp:1727

and

'Handle's typesupport identifier (rosidl_typesupport_cpp) is not supported by this library, at ./src/type_support_dispatch.hpp:111'

I'm seeing the same behaviour on both cyclone and dynamic fastrtps.

v-kiniv commented 7 months ago

I set up the example service as you described and indeed I get {} in response from Foxglove, but then I tried calling directly using raw web sockets client and got a valid response:

* Websocket connected
↓ Sending text to websocket:
{"id": "call_service:/test/trigger:11","op": "call_service","service": "/test","type": "std_srvs/Trigger"}
↓ Received text from websocket:
{"id":"call_service:/test/trigger:11","op":"service_response","result":true,"values":{"message":"Service handler online.","success":true}}

I'm not sure why Foxglove does not parse the result.

Btw, service called successfully and I'm not seeing any errors on rws or service side on both cyclone and fastrtps.

The only time I got errors regarding invalid data size, etc. is when I forgot to prepend ros2 run <service_package_and_node> with RMW_IMPLEMENTATION=rmw_fastrtps_dynamic_cpp (or cyclone, so humble was running on non dynamic RMW by default), so make sure you run both rws and service node with the same RMW implementation.

Please try with raw web sockets, or if you like me to test with roslibjs, please send me gist or repo with the rosilbjs code you are using, as I have no experience with roslibjs and it would be a hassle for me to get it set up.

MoffKalast commented 7 months ago

Hmm, well every time I've tested with a different DDS it was a global .bashrc change, all nodes killed, terminals relaunched, daemon restarted. I doubt that's the problem.

Perhaps there's something nonstandard with the JSON setup the websocket receives, the values need to be a level higher or set to a different key? The easiest thing would be to check what rosbridge sends to a raw websocket and checking what the difference is I suppose. After all if it was identical, there probably wouldn't be any issues.

I'll set up a smaller example with roslibjs if need be, but the easiest would be to just test the same code I'm working with, it's relatively lightweight anyway:

sudo apt install python3-flask

cd your_workspace_dir/src
git clone https://github.com/MoffKalast/vizanti.git -b ros2-rws
cd your_workspace_dir
colcon build

# should launch the whole stack, including rws
ros2 launch vizanti_server vizanti_rws.launch.py

#open a browser and load localhost:5000

I've set up that same service here which gets called upon page load here on the test branch. And it should just alert() the result in JSON parse. Roslibjs is included from here.

v-kiniv commented 7 months ago

Thanks for the provided repo, I narrowed down the problem to a duplicate message with service_response op. I always return an immediate "ack" response to the incoming request and for a service call I also send another one(since it's async) with the actual service response payload. Since "ack" response was incorrectly named service_response, roslibjs saw it as a response to a service call. I renamed "ack" op to service_call instead of service_response and that solved the problem. Please check https://github.com/v-kiniv/rws/pull/11.

One last thing: I'm still seeing 'rmw_serialize: invalid data size error message from rws running with your package, at first glance it seems unrelated to the service, I think it may be related to other ros topics, I will look it to that later, let me know if it's causing any problems and if you can find out more about the issue.

MoffKalast commented 7 months ago

Ah fantastic, it seems to fix up most of the service calls. πŸ‘

I suspect the invalid data size error comes from the calls to /rosapi/topics and /rosapi/nodes which don't seem to return when called. At least my roslibjs callbacks don't ever get invoked. The two services are set up here in that repo, they tend to get called once or twice at load time by a few modules.

v-kiniv commented 7 months ago

It's the service call causing the error in the end, it turns out that this is only a problem with service requests with 0 members, like Trigger. Please check https://github.com/v-kiniv/rws/pull/12

MoffKalast commented 7 months ago

Well hey, that does indeed remove all error messages on my end. Still not getting anything from rosapi, but I'll need to set up my own services for that part anyway since the param get/set part doesn't work half the time on rosbridge either.

v-kiniv commented 7 months ago

My bad, I fixed one thing and broke another(rosapi). The project faded from my memory a bit and I forgot that rosapi calls handled as separate case and return immediately. Here's the regression fix https://github.com/v-kiniv/rws/pull/13

p.s. Just a reminder regarding rosapi and rws:

Unlike Rosbridge, RWS does not expose /rosapi node, all rosapi related API requests are handled internally in rws_server node.

MoffKalast commented 7 months ago

13 seems to fix rosapi. I think everything works as expected now from what I can tell, though I'll have to run more tests and fix a few things on my end as well. Feel free to close this thread for now though, and thanks for all the help. πŸ˜„

Unlike Rosbridge, RWS does not expose /rosapi node, all rosapi related API requests are handled internally in rws_server node.

Yeah I'm aware, I don't really have anything that uses it on the ROS side, every native ROS node already has access to all of that through rclpy/rclcpp anyway and this should in theory reduce latency slightly so it's a better approach imo.

MoffKalast commented 7 months ago

Ok there is one last thing that I've found that's a bit odd. I have these two services offered by a node on the ROS side:

    name: "/vizanti/bag/status",
    serviceType: "std_srvs/srv/Trigger",

    name: "/vizanti/roswtf",
    serviceType: "std_srvs/srv/Trigger",

And when the second one is called from roslibjs, the first one is invoked instead and the result sent. Calling the first one works fine though. It's also the first of the two to be declared in the node, though if I swap that it doesn't have any effect, so it's not registration time. Maybe it's whichever gets called first?

Is it possible there's some kind of name matching that only checks for type and node name? I've also tried changing the service name and it doesn't seem to affect it.

v-kiniv commented 7 months ago

I added another Trigger service to the previous example, and it worked as expected, although both are of the same type, so I thought maybe you accidentally assigned the same callback. But then I decided to double check by calling /vizanti/bag/status and /vizanti/roswtf from Foxglove and yeah, what do you know, if I call /vizanti/bag/status first, then no matter which one I call next, the first one get called every time. So that's definitely some caching issue, and when I checked source code it was pretty obvious that I'm caching by service type, instead of name. Please check https://github.com/v-kiniv/rws/pull/14. Now I wonder if this is the last bug πŸ˜„

MoffKalast commented 7 months ago

Ah I'm glad it was an easy fix, it checks out on my end πŸ‘

Now I wonder if this is the last bug πŸ˜„

tenor

v-kiniv commented 7 months ago

Ah I'm glad it was an easy fix, it checks out on my end πŸ‘ Ok, then I'll merge the PR and close this issue. Thanks again for your help in finding and fixing all these bugs.