Closed injet-zhou closed 6 days ago
Associate the exit status of the parent process with that of its subprocesses. When training across multiple GPUs using the CLI, the parent process always exits with code 0, regardless of whether subprocesses encounter errors.
What does this PR do?
Associate the exit status of the parent process with that of its subprocesses. When training across multiple GPUs using the CLI, the parent process always exits with code 0, regardless of whether subprocesses encounter errors.
Before submitting