Open kkewwei opened 3 months ago
cc : @aasom143 who was looking into holistic API cancellation across different cat and cluster APIs
[Triage - attendees 1 2 3 4] - @kkewwei Thanks for creating the issue. Please check if this can leverage the cancellation framework 13908
@rajiv-kv The the cancellation framework can be used here to solve the first problem, It doesn't solve the second problem.
I wold like to solve the two of the problems within the cancellation framework, cc @aasom143.
Hi @kkewwei, thanks for following up. With the new cancellation framework, we have added a new timeout(cancel_after_time_interval
) that can be used to address the first problem. For the second issue, we already have a timeout
which can configured for each node's transport call. By setting this timeout, we can prevent our requests from being blocked by a faulty node. I hope this provides clarity on how to resolve the second problem.
To address the first issue, could you please add cancellation support for the cat/nodes API? You can refer to the recent PR regarding cancellation support for the cat/shards API.
To address the first issue, could you please add cancellation support for the cat/nodes API? You can refer to the recent PR regarding cancellation support for the cat/shards API.
Of course, thank you.
@aasom143, It's ok now, please have a look when you are free.#14853
Is your feature request related to a problem? Please describe
Now the method is as follows:
It seems has two problems:
cluster().nodesInfo()
andcluster().nodesStats()
use separate timeout, in that case, iftimeout
from the client is30s
, without addingcluster().state()
, the overall time can be60s
, which is 2x times that the expect.cluster().nodesInfo()
can the nextcluster().nodesStats()
be called. It's normal to have a slow node(such as fullGc) in large clusters, the api will become unresponsive, it means that if some of nodes are blocked incluster().nodesInfo()
, the overrall api will be blocked.Describe the solution you'd like
timeout
is30s
in_cat/nodes
, the overall time should be around 30s.cluster().nodesInfo()
andcluster().nodesStats()
.The code can be like this:
Related component
Cluster Manager
Describe alternatives you've considered
No response
Additional context
No response