Filter out stdout from all ranks except the first?

manopapad commented 2 hours ago

@tylerjereddy is trying out multi-rank Legate runs, and was surprised to see duplicated output. This is inherent to the control replication execution model of Legion[^1], whereby all ranks execute the top-level program, and "mask out" the work that corresponds to each.

When we were using the Legion Python bindings (prior to Legate 24.09), we were inheriting this code from Legion, which filters out the stdout from all ranks except the first one.

The questions then are:

(a) do we want to replicate this behavior in post-Legion-bindings Legate? (b) do we make it the default? (c) do we do the same for C++?

I would ask @tylerjereddy to comment more on why this mode is desirable, and also @lightsighter @elliottslaughter, who originally enabled this mode in Legion.

[^1]: Unless we introduce a client-server model on top of a control-replicated Legion run, which is a suggestion for making multi-node Jupyter runs work.

tylerjereddy commented 2 hours ago

I would ask @tylerjereddy to comment more on why this mode is desirable

I'm currently seeing duplicated pytest output on the terminal with multiple ranks, and to make matters worse it hangs for possibly-related or unrelated reasons. Either way, I don't think it should ever be considered acceptable to show the below duplication on the command line when running a testsuite to verify the integrity of a project. It would be an absolute nightmare to debug/make sense of for any large project that decomposes its calculations with legate/cunumeric backing--stacking the task of unwinding/deduplicating the test output on top of whatever the original problem is. And if the original problem is related to concurrency, then the user's mental model is getting polluted by the additional duplicated output as well.

I'm not particularly concerned by whatever changed under the hood, but I am concerned by the information pollution on the command line, and departure from "drop-in replacement" philosophy this leaves for the average Python user that just wants to naivly scale up and possibly print/debug/test their results as they always have.

legate --launcher srun --launcher-extra="-n 6" run_tests.py

============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
============================= test session starts ==============================
platform linux -- Python 3.12.6, pytest-8.3.3, pluggy-1.5.0 -- /users/treddy/miniforge3/envs/nvidia_cunumeric_4/bin/python
cachedir: .pytest_cache
rootdir: /lustre/vescratch1/treddy/gitlab_projects/<snip>
configfile: pyproject.toml
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument
xpmem_attach error: : Invalid argument

tylerjereddy commented 1 hour ago

Another thing I'm curious about is deprecation policy and behavior changes--the change in print behavior caught me by surprise and seems to be something that happened fairly recently, causing a change in debug print behavior between my legate deployments, preventing me from achieving an apples to apples comparison between one that works and one that does not.

Also, launching an incorrect number of program copies can point to slurm or mpi issues, further muddying the waters.

nv-legate / legate.core

Filter out stdout from all ranks except the first? #958