Fix option->options - Githubissues

giovannipizzi commented 9 months ago

There were, I think, a few typos.

giovannipizzi commented 9 months ago

Also, at least in my case, following the example

metadata={
            'options': {
                'prepend_text': 'conda activate some-conda-env'
            }

starts asking also for resources; just putting resources={} works, but adding the submit_script_filename then requires resources': {'tot_num_mpiprocs': 1}. Not sure if this is intended (then one should explain in various points in the docs), or it's a bug (and an issue should be opened?)

sphuber commented 9 months ago

Also, at least in my case, following the example
metadata={
            'options': {
                'prepend_text': 'conda activate some-conda-env'
            }
starts asking also for resources; just putting resources={} works, but adding the submit_script_filename then requires resources': {'tot_num_mpiprocs': 1}. Not sure if this is intended (then one should explain in various points in the docs), or it's a bug (and an issue should be opened?)

That is weird. Are you sure the problem only arises when you specify metadata? Are you using launch_shell_job or launching a ShellJob directly?

What computer is being used? What scheduler does that use and does it define the default_mpiprocs_per_machine value? My guess is that the latter is not defined, and so the resources are underfined for most scheduler types. So it may just be an edge case where you set the localhost computer up before you started using aiida-shell and didn't define the default MPI procs per machine

sphuber commented 9 months ago

If the analysis is correct, I could add a check in launch_shell_job to make sure that default_mpiprocs_per_machine is defined, and if not either raise an exception/warning, or simply automatically set it to 1.

sphuber commented 9 months ago

Thanks for the PR @giovannipizzi

giovannipizzi commented 9 months ago

Yes, I am trying to run with sqlite and no rmq, and the computer was setup with

verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w `echo $PWD/work` -n
verdi computer configure core.local localhost --safe-interval 1 -n

giovannipizzi commented 9 months ago

I would set it internally to 1 if with_mpi is not set/is false

sphuber commented 9 months ago

verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w echo $PWD/work -n verdi computer configure core.local localhost --safe-interval 1 -n

There is the problem. The default for --default-mpiprocs-per-machine in verdi computer setup is not set. So when you don't specify it here, it will be None on the computer. In prepare_computer, I automatically call computer.set_default_mpiprocs_per_machine(1) to prevent this problem. So new users won't run into this problem in any case.

Nevertheless, I agree, it would be good to find a solution for existing localhost computers that don't define it.

I would set it internally to 1 if with_mpi is not set/is false

Do you mean if default_mpiprocs_per_machine?

I am not sure exactly how or where to apply the fix though. There is a reason that verdi computer setup does not set a default for --default-mpiprocs-per-machine because it only makes sense for certain schedulers. For example the SGE scheduler doesn't support this option and setting it will raise a validation error.

I guess this would have to be put in launch_shell_job. Since the resources are validated by the ShellJob inputs, it cannot be put in the ShellJob.prepare_for_submission. So in launch_shell_job I would have to check that the provided computer (either through the code input or the metadata.options.computer input) has default_mpiprocs_per_machine defined if the scheduler type uses the NodeNumberJobResource class as _job_resource_class. Does that sound right?

sphuber / aiida-shell

Fix option->options #75