pf-robotics / kachaka-api

スマートファニチャープラットフォーム「カチャカ」API
https://kachaka.zendesk.com/hc/ja/articles/7660222791183-%E3%82%AB%E3%83%81%E3%83%A3%E3%82%ABAPI
Apache License 2.0
85 stars 24 forks source link

Set timeout for some APIs useful for checking endpoint liveness and connectivity #73

Open belltailjp opened 8 months ago

belltailjp commented 8 months ago

A practical applications using kachaka API would need to check whether the target Kachaka host (API endpoint) is alive or not. Although there is no API exactly for that purpose right now, I suppose one can simply call get_robot_serial_number or get_robot_version and see if they succeed. One problem I have had is that when the API endpoint is not reachable the Python wrapper interface blocks for 20 seconds (MacOS host case) which makes my app frozen, but we usually won't have to wait that long since the API expects the client to be in the same local network. In this PR I'd like to propose the timeout option for those API wrappers functions which is directly passed to gRPC core timeout, so that I can simply let these API wait only for ~0.5s to check API endpoint liveness. For other API functions I think setting timeout is irrelevant, so I only focused on get_robot_serial_number or get_robot_version in this PR.

belltailjp commented 8 months ago

I think we need the timeout argument not only the two methods but also the rest of the methods.

I tried this by adding timeout to all the self.stub.XXX(request) as in the latest commit.

I noticed there are some blocking API calls like GetLastCommandResult, but adding timeout to them is not appropriate, so I didn't pass timeout to this function. (The function GetLastCommandResult is called to wait for completion after starting a command, but setting a timeout will make it fail, although the preceding command itself is already queued at that moment. This results in a strange situation where the command is executed but the API wrapper call (like speak or move_shelf) fails.)

The same applies to other blocking APIs, may I know what they are?

hidai-pfr commented 8 months ago

@belltailjp Basically, I think adding a timeout is a good improvement

To answer your question, please refer to the kachaka API long polling documentation. https://github.com/pf-robotics/baku-kachaka-api?tab=readme-ov-file#cursor-%E3%81%AE%E6%A6%82%E5%BF%B5

For example, GetRobotSerialNumber() returns the value immediately after the request, and then the value is never changed again, so there is no need to call the API a second time. GetRobotPose(), on the other hand, will continually call the API in a typical application. In this case, by setting the cursor and calling it, it can block properly when there is no value change, saving resources and reducing latency. GetLastCommandResult(), which you asked about, is similarly designed to be called with cursor set to retrieve the value when the event occurs. The usage is to always keep the request in the state of being sent out, and then re-request it as soon as the response comes back.

Therefore, it would be better if the initial value of the timeout is None, and the value can be set optionally.

belltailjp commented 7 months ago

Therefore, it would be better if the initial value of the timeout is None, and the value can be set optionally.

I'm sorry but I don't really understand the suggestion very well. My implementation makes timeout optional and sets None as its default value, is it related?

However related to timeout in blocking APIs there would be some sort of discussions:

If my understanding is correct and you agree with it, I guess I probably need to eliminate timeout from GetRobotPose for example. One issue I have here is that I actually cannot learn what exact APIs may block from the provided document, proto definition nor base.py (except start_command) because their interface all look similar (ie receive and return Metadata). I guess I can assume all the GetXXX stuff that get sensor data etc are blocking, but not sure for setting or history related ones.