capsul.process.Process.check_requirements() return

servoz commented 1 year ago

To take an example, let's say we use a pipeline with two processes:

one from SPM (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Smooth())
and one coming from FSL (mia_processes.bricks.preprocess.fsl.processes.Smooth())

If we use the Process.check_requirements() method for each of these bricks, this is what is returned:

The configuration for SPM and FSL are well set: For the SPM process:

{'capsul_engine':
     {'uses':
         {'capsul.engine.module.nipype': 'any',
          'capsul.engine.module.spm': 'any',
          'capsul.engine.module.matlab': 'any'
         }
     },
'capsul.engine.module.nipype':
     {'config_id': 'nipype',
      'config_environment': 'global'
     },
'capsul.engine.module.spm':
     {'config_id': 'spm-standalone',
      'config_environment': 'global',
      'directory': '/home/econdami/SPM/spm12',
      'standalone': False
     },
'capsul.engine.module.matlab':
     {'config_id': 'matlab',
      'config_environment': 'global',
      'executable': '/home/econdami/MATLAB/R2018b/bin/matlab',
      'mcr_directory': '/home/econdami/MATLAB/MATLAB_Runtime/v95'
     }
}

For the FSL process:

{'capsul_engine':
     {'uses':
         {'capsul.engine.module.nipype': 'any',
          'capsul.engine.module.fsl': 'any'
         }
     },
'capsul.engine.module.nipype':
     {'config_id': 'nipype',
      'config_environment': 'global'
     },
'capsul.engine.module.fsl':
     {'config_id': 'fsl',
      'config_environment': 'global',
      'directory': '/data_2/fsl',
      'config': '/data_2/fsl/etc/fslconf/fsl.sh'
     }
}

This result is expected.

The configuration for SPM and FSL are not set: For the SPM process: None For the FSL process:

{'capsul_engine':
     {'uses':
         {'capsul.engine.module.nipype': 'any',
          'capsul.engine.module.fsl': 'any'
         }
     },
'capsul.engine.module.nipype':
     {'config_id': 'nipype',
      'config_environment': 'global'
     },
'capsul.engine.module.fsl':
     {'config_id': 'fsl',
      'config_environment': 'global'
     }
}

For FLS, I can understand the result.

For SPM is it possible to get the uses value (actually the requirement requested by the process) ? and no module because the requirement is not met. Something like this:

{'capsul_engine':
         {'uses':
             {'capsul.engine.module.nipype': 'any',
              'capsul.engine.module.spm': 'any',
              'capsul.engine.module.matlab': 'any'
             }
         }
}

This would allow us to use the result of Process.check_requirements() directly in Mia.

denisri commented 1 year ago

Yes, there is a discrepency between FSL and SPM, and check_requirements() doesn't fully do its job.

But I think you are not expecting the right thing from check_requirements(): its job is not to provide which config options should be filled, but to say if the config values are OK to run the job. Here the answer is NO. So it's not SPM which is wrong, but FSL. Here you don't have filled the config: it is not valid, so check_requirements() should return None.

For SPM is it possible to get the uses value (actually the requirement requested by the process) ? and no module because the requirement is not met.

I don't understand what you are expecting: is this not process.requirements() ? Or you mean requirements with dependencies ? Yes I think I understand. OK then it's:

engine.settings.select_configurations('global', uses=process.requirements())['capsul_engine']['uses']

What I propose is to add a function in each config module to check if the selected config is obviously invalid (for instance if a mandatory path is left to Undefined). Settings.select_configurations() would (maybe optionally) check the configs it selects using this function, and discard them if the check is not passed. This way, process.check_requirements() would return None in both cases (SPM and FSL), and this would be right. Note that more information is given by check_requirements() if an optional list of messages is provided:

msg = []
res = process.check_requirements('global', msg)
if res is None:
    print('requirements failed because:')
    print(msg)

currently there is a bug in those messages, they print the full requirements dict instead of just the unmet required modules, I'm fixing it. Would this be OK ?

servoz commented 1 year ago

In fact I'm trying to determine before the execution of a pipeline if the configuration values are correct to run the job (in order to stop the calculation in Mia properly and to inform the user as precisely as possible about the problem encountered - for example such a process can't run because the configuration is not correct).

This goes beyond the fact whether a mandatory path is left at Undefined, because the user could enter an invalid path. Currently in Mia the configuration is tested when the user defines it. This solves the problem of the erroneous path. I am aware that this is only a partial solution because in case of remote calculation it does not work. I think that it would be better to have a configuration test at runtime, but in order to simplify the work for the moment I leave that aside (although the whole thing is linked!).

So for the moment I'm looking for the initialisation to know that the process has a configuration issue (actually if we exclude for the moment the test to know if the path is valid, at least know if it is Undefined. Although this is once again partial, because if the user has not entered a path, it is possible that he has set the corresponding environment variable that allows the process to run anyway: it would be necessary to check that the path is not Undefined and if it is the case to check if an environment variable exists, if so to use it - once again with the local limitations, if the test is not done only at runtime) ...

In short, to do well the question goes far beyond the initial question, but let's come back to this one ...

For the moment, I would like to be able to determine the process that doesn't have a path to execute it and if possible in this case that doesn't have an environment variable to do it. Let's leave aside for the moment the test to know if this path is valid and the environment variable ...

I can do this work directly in Mia, but I would prefer to use a Capsul method in order to avoid writing quite similar things in capsul and populse_mia (which we had decided to stop doing).

So, if in all cases (SPM as FSL in the example) we retrieve None that's fine for me (I know how to easily retrieve requirements for processes in mia_processes, so I should be able to tell the user: this process needs this requirement and it is not satisfied; for other processes maybe it would be more complicated ? that's why I was interested in the uses key, but finally I think I could do without it).

servoz commented 1 year ago

I just see your c545423 commit. I will see what can I do with it! Thanks!

denisri commented 1 year ago

Paths validity is complex in a client/server environment, so yes we should postpone this topic, and focus on the existence of values in paths. I'm writing a bit of code to check that. Each module can provide a check function.

denisri commented 1 year ago

OK I've done the basics checks. I hope it's enough for now, I have to finish another job before the week-end ;)

servoz commented 1 year ago

For me it's OK.

I managed to do what needed to be done in populse_mia with the requirements (or rather lack of them).

It is obvious that this is only a first superficial layer (but necessary), because one day it will be necessary to test the config at runtime (for remote calculations) and the use of environment variables ( if the config is not filled in and the environment variable has been defined by the user).

I opened a PR in connection with this ticket (in order to avoid producing side effects which I would not have thought of). This PR is necessary for things to work properly.

In nutshell: . we always add matlab when we need spm (not only for spm standalone) . In spm validity check the version can be undefined . A small minor modification to space out the printing of the configurations if there are several.

populse / capsul

capsul.process.Process.check_requirements() return #280