Closed adi611 closed 9 months ago
Patch coverage: 21.87%
and project coverage change: -1.61%
:warning:
Comparison is base (
0245cdc
) 83.42% compared to head (28ec2fd
) 81.82%.:exclamation: Current head 28ec2fd differs from pull request most recent head 2c695d5. Consider uploading reports for the commit 2c695d5 to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
i'm trying to understand why the slurm test fails now (last run I cancelled, but it took 5h...), since you didn't any new test here yet. any idea?
@adi611 - perhaps you can merge current master and try new actions/checkout
that was introduced. I've run test_shelltask.py
with the docker image on my laptop and it was fine
@adi611 - perhaps you can merge current master and try new
actions/checkout
that was introduced. I've runtest_shelltask.py
with the docker image on my laptop and it was fine
Ok I will try this out
i'm trying to understand why the slurm test fails now (last run I cancelled, but it took 5h...), since you didn't any new test here yet. any idea?
I couldn't find any reason for the workflow to not work here, when it works elsewhere. I will try the new actions/checkout
and check if the problem persists.
I have updated the worker. Currently I am using the slurm
instance for psij.JobExecutor
to run Slurm tests.
Prerequisite:
/pydra
as done in the Slurm GA workflowpsij-python
in the docker containerpsij
or make changes in conftest file as discussed in #691 Features:
Known issues:
pydra/pydra/engine/tests/test_shelltask.py::test_shell_cmd_4
pass but test_shell_cmd_4
fails when testing all tests in test_shelltask.py
.I am working on fixing these issues.
@adi611 - I've had problems with my MIT account during the weekend, so I was not able to test it, but hopefully will be able to solve my issues tomorrow
@djarecka - I think the issue arises when running the tests with the psij
plugin and -n auto
mode of pytest
. I tried running all the tests in test_shelltask.py
:
local
instance of the psij
job executor, with the command pytest --color=yes -v pydra/engine/tests/test_shelltask.py
. Here is the result: (2 tests fail, others run and pass without any issue)
========================================== short test summary info ==========================================
FAILED pydra/engine/tests/test_shelltask.py::test_shell_cmd_outputspec_1b_exception[psij] - Failed: DID NOT RAISE <class 'Exception'>
FAILED pydra/engine/tests/test_shelltask.py::test_shell_cmd_outputspec_2a_exception[psij] - Failed: DID NOT RAISE <class 'Exception'>
====================== 2 failed, 147 passed, 1 skipped, 3 xfailed in 388.45s (0:06:28) ======================
slurm
instance of psij
job executor (without -n auto
): the tests run and pass, but at a very slow paceslurm
instance of psij
job executor (with -n auto
): many tests failI have fixed the issue with pytest's -n auto
mode; it can now run multiple tests at once.
@adi611 - I've tried to run the tests with new worker on our cluster, but tests are failing, I'm not able to get results when running with Submitter
. Will try to check debug it in the next few days
@adi611 - I've tried to run the tests with new worker on our cluster, but tests are failing, I'm not able to get results when running with
Submitter
. Will try to check debug it in the next few days
I think the issue might be with the paths to files and functions (like /pydra/pydra/engine/run_pickled_function.py
), and/or the python3.9
command. I used the previous Slurm container as reference for writing this and I should make it more generalized now. Could you please check if the issue still persists after correcting the paths and python command? Meanwhile I will update the PR.
yes, it should be more generalized, you can always use the task output_dir
I think it should work now
Added the option to switch between the different schedulers provided by psij
. For e.g., plugin='psij-local'
and plugin='psij-slurm'
will use local
and slurm
instances for psij
's job executor respectively. For now, the list contains only local
and slurm
options.
could you please modify conftest
so we can start testing the new workers in GA.
I've started testing psij-local
on my osx and 5 tests are failing. Could you try to debug them?
I also run a subset of tests on slurm on the MIT cluster and it looks like they are fixed! :) but I will test more tomorrow!
could you please modify
conftest
so we can start testing the new workers in GA.
I'm currently working on it.
I've started testing
psij-local
on my osx and 5 tests are failing. Could you try to debug them?
Can you please tell me which tests are failing?
I also run a subset of tests on slurm on the MIT cluster and it looks like they are fixed! :) but I will test more tomorrow!
Great!
The psij-local
and psij-slurm
pass all the required tests, though psij-slurm
takes much longer than the actual slurm
plugin. I will update the conftest.py
file to resolve the conflicts with other tests. Please let me know if I need to make other changes.
I will also try to improve the performance of psij-slurm
with new version the psij-local
works on my osx! will check the new version of slurm tomorrow morning.
for the conftest, you could just expand the list of worker and add psij-local
to cf
and psij-slurm
to slurm
and see how long it takes on GA
for the conftest, you could just expand the list of worker and add
psij-local
tocf
andpsij-slurm
toslurm
and see how long it takes on GA
I have updated the conftest.py
to add the psij
plugin options. The fixtures --psij=local
and --psij=slurm
correspond to the psij-local
and psij-slurm
plugins respectively. Also, I have created specific GA workflows for the psij
plugins: PSI/J-Local
and PSI/J-SLURM
.
@djarecka - I checked the test_dockertask
tests in PSI/J-Local / test
and all the tests which pass for cf
pass for psij-local
, for e.g., test_docker_1
pass while test_docker_3
gets skipped, for both cf
and psij-local
.
Types of changes
Summary
Add a new worker called
PsijWorker
toworkers.py
which uses PSI/J to run tasks.Checklist