Open d4l3k opened 3 months ago
One solution for this would be to launch each process using a threadpool which would allow for us to not block as much on mp.Process.start
. Using threads may have other negative side effects since there's more overhead but generally should be fine
Subprocess supports specifying the pipe size but there doesn't seem anyway to do this with mp.Process
import time from torch.multiprocessing import spawn import os
time.sleep(1)
def trainer(rank): print(f"Running trainer in process {rank}")
if name == 'main': world_size = 10
start = time.perf_counter()
# No need for large arguments, just pass simple arguments
args = []
# Spawn the processes
spawn(
fn=trainer,
args=args, # Empty args passed to each process
nprocs=world_size, # Number of processes
join=True # Wait for processes to complete
)
print(f"Time taken: {time.perf_counter() - start}")
🐛 Describe the bug
With small input arguments (<64kb)
start_processes
runs quickly as the processes are launched asynchronously.When they're large we end up blocking in https://github.com/python/cpython/blob/main/Lib/multiprocessing/popen_spawn_posix.py#L62 when writing to the pipe. The default pipe buffer size is 64kb so larger than that requires the child process to fully start.
https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer
Repro:
Versions
cc @VitalyFedyunin