trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.42k stars 3k forks source link

The number of workers is not perfectly right if more than one coordinators exist in the same cluster #641

Open PennyAndWang opened 5 years ago

PennyAndWang commented 5 years ago
  In our  company,in order to restart the coordinator gracefully,  we need launch another new coordinator in the same cluster at some point. The coordinator is always set  “node-scheduler.include-coordinator=false”。I find the number of workers is not perfectly right  if more than one coordinator exist in the same cluster。For example ,if a cluster has 5 nodes and there are two coordinators  in this 5 nodes, the the number of workers  is 4 instead of 3。

I am not sure whether  it is a bug or not in this situation. I just want to know the community's opinion about this. If it is a bug, i want to fix it.
Praveen2112 commented 5 years ago

Hi @PennyAndWang , That's a nice catch. I am not sure if it is the right approach to run multiple coordinators in the same cluster as it might make the workers to switch between multiple coordinators and it might lead to some racing issue for memory allocations and other stuffs.

Apart from that the reason why it shows 4 workers is that in ClusterStatsResource we get the active node count and decrement it by 1 (assuming that there is one active coordiantor) while we can replace it with something like this 'Sets.difference(allNodes.getActiveNodes(), allNodes.getActiveCoordinators()).size();' which gives the effective active worker count.

PennyAndWang commented 5 years ago

@Praveen2112 ,really thank you very much for your reply and your advice。 I know the reason according the source code ,that is "if (!isIncludeCoordinator) {activeNodes -= 1;}" 。 I think the real number of workers should be more accurate , so i will submit the code 。

sopel39 commented 5 years ago

@PennyAndWang in case of multiple coordinators, coordinator should point to localhost as discovery server. Workers should connect to only single discovery server. For example this could be achieved via coordinator dns binding or attaching dedicated coordinator network interface (e.g: ENI in AWS).

PennyAndWang commented 5 years ago

@sopel39 , really thank you very much for your reply. I will explain the real reason. In our company, we deployed a discovery server. The coordinator and the workers both registered with the discovery server. The coordinator is not a discovery service node and is set to "discovery-server.enabled=false". Moreover, there is a layer of proxy on the client and server to implement the highly available function of the coordinator. When we need to restart the coordinator, we need to use the proxy to switch user traffic to the new coordinator, so that it does not affect the normal use of the user. So there will be a situation that there are multiple coordinators in a cluster.

PennyAndWang commented 5 years ago

@Praveen2112 ,I want to change the code to "long activeNodes = Sets.difference(nodeManager.getNodes(NodeState.ACTIVE),nodeManager.getAllNodes().getActiveCoordinators()).size();" , but the new problem comes. If the coordinator is set " node-scheduler.include-coordinator=true", ie the coordinator is also a compute node, then the above code is still not very accurate. I want to solve the above problem by the information from the discover server and find that the discovery server information does not provide whether the coordinator is a compute node or not. So now I am a bit confused. Maybe this question is not a big problem for the community, but it still has a little impact on us. By the way, I apologize if this issue affects or bothers you.

sopel39 commented 5 years ago

The coordinator is not a discovery service node and is set to "discovery-server.enabled=false". Moreover, there is a layer of proxy on the client and server to implement the highly available function of the coordinator.

I don't think you can implement coordinator HA with just single discovery server instance. It could be that dispatcher (https://github.com/prestosql/presto/pull/95) could help here, but its still in review.

tooptoop4 commented 4 years ago

is this fixed @PennyAndWang ?