[general need] The self object in the list_outputs() and run_process_mia() methods of a mia_processes class are not the same

I don't know where to put this ticket, it comes from observations while working in the mia_processes package, it certainly has involvement in capsul, populse_mia, soma-workflow...... I put it in populse_mia by default.

The self object in the list_outputs() and run_process_mia() methods of a mia_processes class, (e.g. the mia_processes.bricks.preprocess.spm.Smooth()), are not the same (not the same memory address).

However, in the self object of the run_process_mia() method we find the essential elements to start the run: self.process if it has been defined in the constructor and the different parameters of the plug nodes (process parameters) when the traits have also been defined in the constructor.

But :

The self object in the list_outputs() method contains the project object, which allows interaction with the database when this method is crossed. The self object in the run_process_mia() method does not contain the project object, so it is not possible to interact with the database when running with this method.

One might think that there is no point in interacting with the database at runtime as the indexing is currently done before runtime at initialisation time (however, discussions have already pointed that this is not a good idea and that it would be better to do the database indexing at runtime to avoid data in the database that doesn't exist yet! ), One solution would be to set some parameters for runtime, in the list_outputs() method. This would solve the problem of accessing the database, for example.

The attributes of the self object, defined in list_outputs() are not accessible in run_process_mia(). For example self.foo = 'foo' instantiation in list_outputs() then in run_process_mia() self.foo is not existing.

I think there would be a way to predefine traits, at constructor time, for "attributes for passing from list_outputs() to run_process_mia()", but this is quite cumbersome and I have not yet tested whether, without making any code changes, these attributes would not be indexed in the history.

In short, although the run_process_mia() self is not initially intended to hold the project object, it would simplify things if it were present. We would need to find in run_process_mia() the attributes defined in list_outputs().

I'm starting to work on the automatic pdf report generation bricks (so that at the end of the run the user can consult a pdf file with a summary of the results rather than having to look for the data in the database or on the computer) and the current features described here make it quite difficult to code these bricks ...

I think I understand. run_process_mia() is called at run time, which happens in a separate process (in the OS/Unix meaning of "process"), which means it does not share memory with the process which has been setup on client side by the user in the MIA GUI. The process is transformed into a soma-workflow job, then it is re-created at runtime (unserialized using its parameters) for execution in the execution process. This explains why self is different. The un-serialization is based on 2 dicts:

the parameters dict, that is traits defined for the Process
the configuration options needed by the process requirements.

Attributes which are not parameters (traits) are thus not passed, and there will be no simple way to do so: this is the role for parameters, thus we should use parameters for that.

To allow database access during runtime, we would need to implement 2 things:

declare a database access module in the config, and require it in this particular Process class, so that the config will carry database information
setup databasing when initializing the config on runtime side, using this config information.

This is totally possible, but just not implemented up to now. With a restriction however: it is possible as long as the database is "visible" from the execution process: remote computing can take place on a remote machine (a remote cluster for instance) which will not necessary see the database which is managed on client side. This is why (one of the reasons why, actually) we discourage the use of the database during runtime, if possible.

OK, thank you @denisri for this very clear explanation, which confirms what I had the feeling to have understood by reading the codes of capsul and soma-workflow, for this part.

Yes, we have to take into account the remote calculation and it is therefore perhaps not a good idea to allow access to the database during the calculation. Let's drop that option!

I'm starting to use another option: "a fake parameter" (traits) instead, which would only be used to pass information between initialisation (list_outputs() in mia) and execution (run_process_mia() in mia). This way, the things needed for execution are prepared at initialisation while the database is still available. I still have a few issues that I haven't had time to investigate today. I'll keep you posted on the progress, in this direction!

If you think this second option is silly, don't hesitate to say so, and especially why?

If you have another, more robust idea for doing the job, don't hesitate either. As a reminder: to do automatic report generation we need to have access to information in the database (e.g. the patient's name, etc...). The question is how to retrieve this information at run time (the report generation will be initiated by the run_process_mia method)?

I don't know what you want to pass as "fake parameters" so I have no opinion... For database fields like patient name, I see 2 options:

make the patient name be a string parameter in the process, and make it filled in automatically via the completion system
actually allow database access during execution, with the restrictions evoked above, plus the fact that the processing part will need to know the database API and structure (fields names etc, which are not always the same from one database to another).

I'd think that if just one field (like patient name) is used, solution 1 is probably enough and more flexible, but if some "massive" databasing is required (like filtering requests etc), then solution 2 will be simpler to implement and probably more lightweight (otherwise requests results will need to be passed as parameters values, which may be big).

I don't know what you want to pass as "fake parameters" so I have no opinion...

The "fake parameter" I am currently using is a dictionary whose traits are defined in the constructor as the other "real" input and output parameters of the process. The traits for this "false parameter" have the userlevel=1 option which allows them to not be seen in the controller. In fact it is an attribute that only acts as a messenger between the list_outputs() method and the run_process_mia() method to pass the database parameters needed to the running process. It seems to do the job. I'll just have to see if it can't be indexed in the database for history (I think it shouldn't be too complicated to do) since it's a fake parameter (I don't know what else to call this parameter ... "hidden parameter"?).

By the way there are other parameters, with a userlevel>0 that should not appear in the history and that also appear today (#273 ticket).

Defining a traits for this "fake parameter" that is only used as a messenger for the runtime is quite cumbersome (but seems to do the job), so I was asking if there was a more elegant way to do the job.

Ok, after some changes in populse_mia and looking at what's going on, I think we can keep the idea of the fake (or hidden? or messenger ?, I'm not sure what to call it!) dict4runtime parameter as a messenger between the initialisation step with database access and the runtime.

dict4runtime is not observed in the controller GUI: OK
However, it is observed in the history. I don't really know if this is a bad thing (why not have access to all parameters if we want see the history, as we may do this for a debugging reason).

There is just an exception thrown with this fake parameter in the capsul.pipeline.xml module (SyntaxError: invalid syntax) that prevents it from being saved because the value is (for example) {'patient_name': 'alej', 'study_name': 'MRI Fonct. + perfusion', 'acquisition_date' : '2016-03-17 08:34:44', 'sex' : <undefined>, 'site' : 'Grenoble University Hospital - CLUNI', 'mri_scanner' : 'Philips Achieva 3.0T TX', 'age' : <undefined>}(it seems that Undefined is badly handled here!). It doesn't seem to disrupt the operation of the brick with this fake parameter!

I think this ticket can be closed now.

populse / populse_mia

[general need] The self object in the list_outputs() and run_process_mia() methods of a mia_processes class are not the same #272