intel / intel-xpu-backend-for-triton

OpenAI Triton backend for Intel® GPUs
MIT License
142 stars 43 forks source link

pytest' `rootdir` is not detected correctly when using `--warnings-json-output-file` option #1750

Closed anmyachev closed 3 months ago

anmyachev commented 3 months ago

Example of the problem:

WORKSPACE=$(pwd)
echo ${WORKSPACE}

# install test deps
pip install pytest pytest-xdist pytest-rerunfailures pytest-select pytest-timeout expecttest
pip install git+https://github.com/kwasd/pytest-capturewarnings-ng.git@v1.2.0

# Create directory for tests reports
if [ ! -d "${WORKSPACE}/reports" ]; then
mkdir "${WORKSPACE}/reports"
fi
export TRITON_TEST_REPORTS=true
export TRITON_TEST_WARNING_REPORTS=true
export TRITON_TEST_REPORTS_DIR=${WORKSPACE}/reports
export TEST_UNSKIP=false

# Set a default skip list
export TRITON_TEST_SKIPLIST_DIR=${WORKSPACE}/scripts/skiplist/default

# Run core tests
source ${WORKSPACE}/scripts/pytest-utils.sh

cd python/test/unit
TRITON_TEST_SUITE=language \
  pytest -n ${NUM_PROCESSES} --device xpu language/ --ignore=language/test_line_info.py --ignore=language/test_subprocess.py
cd -

Log:

platform linux -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0
select: deselecting tests from './intel-xpu-backend-for-triton/scripts/skiplist/current/language.txt', failing on missing selection items
rootdir: ./intel-xpu-backend-for-triton
configfile: pyproject.toml
plugins: select-0.1.2, capturewarnings-ng-1.2.0, rerunfailures-14.0, timeout-2.3.1, xdist-3.6.1
4 workers [11321 items]
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "./python3.10/site-packages/_pytest/main.py", line 283, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>   File "./python3.10/site-packages/_pytest/main.py", line 337, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>   File "./python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
INTERNALERROR>   File "./python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>   File "./python3.10/site-packages/pluggy/_callers.py", line 139, in _multicall
INTERNALERROR>     raise exception.with_traceback(exception.__traceback__)
INTERNALERROR>   File "./python3.10/site-packages/pluggy/_callers.py", line 122, in _multicall
INTERNALERROR>     teardown.throw(exception)  # type: ignore[union-attr]
INTERNALERROR>   File "./python3.10/site-packages/_pytest/logging.py", line 805, in pytest_runtestloop
INTERNALERROR>     return (yield)  # Run all the tests.
INTERNALERROR>   File "./python3.10/site-packages/pluggy/_callers.py", line 122, in _multicall
INTERNALERROR>     teardown.throw(exception)  # type: ignore[union-attr]
INTERNALERROR>   File "./python3.10/site-packages/_pytest/terminal.py", line 673, in pytest_runtestloop
INTERNALERROR>     result = yield
INTERNALERROR>   File "./python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "./python3.10/site-packages/xdist/dsession.py", line 138, in pytest_runtestloop
INTERNALERROR>     self.loop_once()
INTERNALERROR>   File "./python3.10/site-packages/xdist/dsession.py", line 163, in loop_once
INTERNALERROR>     call(**kwargs)
INTERNALERROR>   File "./python3.10/site-packages/xdist/dsession.py", line 306, in worker_collectionfinish
INTERNALERROR>     self.sched.schedule()
INTERNALERROR>   File "./python3.10/site-packages/xdist/scheduler/load.py", line 295, in schedule
INTERNALERROR>     self._send_tests(node, node_chunksize)
INTERNALERROR>   File "./python3.10/site-packages/xdist/scheduler/load.py", line 307, in _send_tests
INTERNALERROR>     node.send_runtest_some(tests_per_node)
INTERNALERROR>   File "./python3.10/site-packages/xdist/workermanage.py", line 355, in send_runtest_some
INTERNALERROR>     self.sendcommand("runtests", indices=indices)
INTERNALERROR>   File "./python3.10/site-packages/xdist/workermanage.py", line 374, in sendcommand
INTERNALERROR>     self.channel.send((name, kwargs))
INTERNALERROR>   File "./python3.10/site-packages/execnet/gateway_base.py", line 911, in send
INTERNALERROR>     raise OSError(f"cannot send to {self!r}")
INTERNALERROR> OSError: cannot send to <Channel id=3 closed>

~Please note that tests are still skipped, but this only happens due to a coincidence of the test name (but not identifiers).~ Maybe it depends on the python version. For 3.9 the problem is not reproduced (I use 3.10)

See pytest-select code for details: https://github.com/ulope/pytest-select/blob/31e39c5a9b08d4509520941d47fe69cbd2d34dbc/pytest_select/plugin.py#L76

pbchekin commented 3 months ago

Note that the method for collecting test names recommended by pytest-select does not show python:

$ cd python/test/unit/
$ pytest --device xpu language/ --ignore=language/test_line_info.py --ignore=language/test_subprocess.py --collect-only --quiet
test/unit/language/test_annotations.py::test_int_annotation[False-8]
test/unit/language/test_annotations.py::test_int_annotation[False-16]
test/unit/language/test_annotations.py::test_int_annotation[False-32]
test/unit/language/test_annotations.py::test_int_annotation[False-64]
test/unit/language/test_annotations.py::test_int_annotation[True-8]
...
pbchekin commented 3 months ago

Cannot reproduce in my local environment (jupyterhub session) with Python 3.10. Also we run daily tests with Python 3.9 - 3.12, we should have detected this issue long time ago.

Try to run without pytest-xdist (do not specify -n ${NUM_PROCESSES}), potentially there is another issue.

anmyachev commented 3 months ago

The only difference seems to be in how pytest defines rootdir /intel-xpu-backend-for-triton/python https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/10194903852/job/28202467254#step:17:53 vs /intel-xpu-backend-for-triton (in my env)

But I still don't understand why this is so.

anmyachev commented 3 months ago

The problem was that the process of determining the root directory considers the folder that is not transferred together with the option as a folder with tests and takes it into account. However, this is not correct.

Debug info from determine_setup function:

inifile = None
args = ['./intel-xpu-backend-for-triton/reports/language-warnings.json', '--warnings-json-output-file', '--deselect-from-file=./intel-xpu-backend-for-triton/scripts/skiplist/current/language.txt', '--select-fail
-on-missing', '--device', 'xpu', 'language/']
rootdir_cmd_arg = None
invocation_dir = PosixPath('./intel-xpu-backend-for-triton/python/test/unit')

If you make such a patch, the definition of the root folder will become correct.

            "--warnings-json-output-file"
            "$TRITON_TEST_REPORTS_DIR/${TRITON_TEST_SUITE}-warnings.json"
vs
            "--warnings-json-output-file=$TRITON_TEST_REPORTS_DIR/${TRITON_TEST_SUITE}-warnings.json"

There is still no understanding of why the problem is not reproduced in the CI.