Closed giovannipizzi closed 9 months ago
Also, at least in my case, following the example
metadata={
'options': {
'prepend_text': 'conda activate some-conda-env'
}
starts asking also for resources; just putting resources={}
works, but adding the submit_script_filename
then requires resources': {'tot_num_mpiprocs': 1}
.
Not sure if this is intended (then one should explain in various points in the docs), or it's a bug (and an issue should be opened?)
Also, at least in my case, following the example
metadata={ 'options': { 'prepend_text': 'conda activate some-conda-env' }
starts asking also for resources; just putting
resources={}
works, but adding thesubmit_script_filename
then requiresresources': {'tot_num_mpiprocs': 1}
. Not sure if this is intended (then one should explain in various points in the docs), or it's a bug (and an issue should be opened?)
That is weird. Are you sure the problem only arises when you specify metadata
? Are you using launch_shell_job
or launching a ShellJob
directly?
What computer
is being used? What scheduler does that use and does it define the default_mpiprocs_per_machine
value? My guess is that the latter is not defined, and so the resources are underfined for most scheduler types. So it may just be an edge case where you set the localhost
computer up before you started using aiida-shell
and didn't define the default MPI procs per machine
If the analysis is correct, I could add a check in launch_shell_job
to make sure that default_mpiprocs_per_machine
is defined, and if not either raise an exception/warning, or simply automatically set it to 1.
Thanks for the PR @giovannipizzi
Yes, I am trying to run with sqlite and no rmq, and the computer was setup with
verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w `echo $PWD/work` -n
verdi computer configure core.local localhost --safe-interval 1 -n
I would set it internally to 1 if with_mpi is not set/is false
verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w
echo $PWD/work
-n verdi computer configure core.local localhost --safe-interval 1 -n
There is the problem. The default for --default-mpiprocs-per-machine
in verdi computer setup
is not set. So when you don't specify it here, it will be None
on the computer. In prepare_computer
, I automatically call computer.set_default_mpiprocs_per_machine(1)
to prevent this problem. So new users won't run into this problem in any case.
Nevertheless, I agree, it would be good to find a solution for existing localhost
computers that don't define it.
I would set it internally to 1 if with_mpi is not set/is false
Do you mean if default_mpiprocs_per_machine
?
I am not sure exactly how or where to apply the fix though. There is a reason that verdi computer setup
does not set a default for --default-mpiprocs-per-machine
because it only makes sense for certain schedulers. For example the SGE scheduler doesn't support this option and setting it will raise a validation error.
I guess this would have to be put in launch_shell_job
. Since the resources are validated by the ShellJob
inputs, it cannot be put in the ShellJob.prepare_for_submission
. So in launch_shell_job
I would have to check that the provided computer
(either through the code
input or the metadata.options.computer
input) has default_mpiprocs_per_machine
defined if the scheduler type uses the NodeNumberJobResource
class as _job_resource_class
. Does that sound right?
There were, I think, a few typos.