Adaptive port binding for the shuffle service

zlobober commented 1 month ago

https://github.com/ytsaurus/ytsaurus-spyt/blob/e872c34b7a06afb82753c919fe48e281492a6b17/spyt-package/src/main/spark-extra/conf/spark-defaults.conf#L4

2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 WARN  Utils [main]:69 - Service 'sparkWorker' could not bind on port 27001. Attempting port 27002.
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 WARN  Utils [main]:69 - Service 'sparkWorker' could not bind on port 27002. Attempting port 27003.
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO  Utils [main]:57 - Successfully started service 'sparkWorker' on port 27003.
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO  Worker [main]:57 - Worker decommissioning not enabled.
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 WARN  ExternalShuffleService [main]:69 - 'spark.local.dir' should be set first when we use db in ExternalShuffleService. Note that this only affects standalone mode.
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO  Worker [dispatcher-event-loop-1]:57 - Starting Spark worker eu-north1<...>.net:27003 with 23 cores, 256.0 GiB RAM
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO  Worker [dispatcher-event-loop-1]:57 - Running Spark version 3.2.2
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO  Worker [dispatcher-event-loop-1]:57 - Spark home: /slot/sandbox/./tmpfs/spark
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 INFO  ExternalShuffleService [dispatcher-event-loop-1]:57 - Starting shuffle service on port 27000 (auth enabled = false)
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - 2024-08-13 11:52:04 ERROR Worker [dispatcher-event-loop-1]:94 - Failed to start external shuffle service
2024-08-13 11:52:04 INFO  WorkerLauncher$ [Thread-3]:197 - java.net.BindException: Address already in use

While some of the ports are chosen adaptively, the port for shuffle service seems to be chosen statically. We'd better make it also adaptive to prevent too much conflicts for ports on multi-tenant YT clusters.

It is also a good idea to check that the rest of required ports are also chosen adaptively.

zlobober commented 1 month ago

BTW, do you remember why we are not using the functionality of user ports by YT? E.g. environment variables YT_PORT_0, YT_PORT_1, ..., for which YT takes responsibility to ensure that they are not bound to anybody?

Alexvsalexvsalex commented 1 month ago

First of all, the port for shuffle service must be fixed. It's because an executor concats worker_host and shuffle.service.port to get shuffle service's address of another worker. So we need to select the port before cluster startup when Spark configuration shares between nodes. We agreed that it's good point to randomize shuffle service port inside spark-launch-yt. Then clusters will have good enough diversity of ports. It will be solved in the next release.

zlobober commented 1 month ago

First of all, the port for shuffle service must be fixed. It's because an executor concats worker_host and shuffle.service.port to get shuffle service's address of another worker.

Can't this be changed? We do not have the same issue with different kinds of ports.

How the set of ports to be chosen is to be configured? You do understand that the range should be configurable (ideally, by a cluster-wide configuration), right?

Alexvsalexvsalex commented 1 month ago

As we can see Spark's developers report about fixed shuffle service port. Also this function is used for any external shuffle service interaction. So every node must have the same port and it cannot be selected dynamically. In my opinion it happened because shuffle service is the only way for direct worker-worker interaction. Other ports (rest/ui/etc) are accessible through the master that knows about all workers.

zlobober commented 1 month ago

Ok, I see. And what's the answer for these questions?

How the set of ports to be chosen is to be configured? You do understand that the range should be configurable (ideally, by a cluster-wide configuration), right?

Alexvsalexvsalex commented 3 weeks ago

We decided to do:

Remove default service.shuffle.port from the config.
Introduce two configs spark.shuffle.service.port.interval.start (default=27050) and spark.shuffle.service.port.interval.size (default=50) describing an interval of available ports. These configs might be changed globally in //home/spark/conf.
Every SPYT cluster startup a random port will be chosen from specified range.

When you start SPYT cluster you can also specify a fixed port or change interval parameters:

spark-launch-yt ... --params {'spark_conf'={'spark.shuffle.service.port'="27123"}}

spark-launch-yt ... --params {'spark_conf'={'spark.shuffle.service.port.interval.start'="19400"}}

Alexvsalexvsalex commented 2 weeks ago

It's merged. New port binding will be in the next release.

ytsaurus / ytsaurus-spyt

Adaptive port binding for the shuffle service #15