Closed servoz closed 1 year ago
I think I understand.
run_process_mia()
is called at run time, which happens in a separate process (in the OS/Unix meaning of "process"), which means it does not share memory with the process which has been setup on client side by the user in the MIA GUI. The process is transformed into a soma-workflow job, then it is re-created at runtime (unserialized using its parameters) for execution in the execution process. This explains why self
is different.
The un-serialization is based on 2 dicts:
Attributes which are not parameters (traits) are thus not passed, and there will be no simple way to do so: this is the role for parameters, thus we should use parameters for that.
To allow database access during runtime, we would need to implement 2 things:
This is totally possible, but just not implemented up to now. With a restriction however: it is possible as long as the database is "visible" from the execution process: remote computing can take place on a remote machine (a remote cluster for instance) which will not necessary see the database which is managed on client side. This is why (one of the reasons why, actually) we discourage the use of the database during runtime, if possible.
OK, thank you @denisri for this very clear explanation, which confirms what I had the feeling to have understood by reading the codes of capsul and soma-workflow, for this part.
Yes, we have to take into account the remote calculation and it is therefore perhaps not a good idea to allow access to the database during the calculation. Let's drop that option!
I'm starting to use another option: "a fake parameter" (traits) instead, which would only be used to pass information between initialisation (list_outputs() in mia) and execution (run_process_mia() in mia). This way, the things needed for execution are prepared at initialisation while the database is still available. I still have a few issues that I haven't had time to investigate today. I'll keep you posted on the progress, in this direction!
If you think this second option is silly, don't hesitate to say so, and especially why?
If you have another, more robust idea for doing the job, don't hesitate either. As a reminder: to do automatic report generation we need to have access to information in the database (e.g. the patient's name, etc...). The question is how to retrieve this information at run time (the report generation will be initiated by the run_process_mia method)?
I don't know what you want to pass as "fake parameters" so I have no opinion... For database fields like patient name, I see 2 options:
I'd think that if just one field (like patient name) is used, solution 1 is probably enough and more flexible, but if some "massive" databasing is required (like filtering requests etc), then solution 2 will be simpler to implement and probably more lightweight (otherwise requests results will need to be passed as parameters values, which may be big).
I don't know what you want to pass as "fake parameters" so I have no opinion...
The "fake parameter" I am currently using is a dictionary whose traits are defined in the constructor as the other "real" input and output parameters of the process. The traits for this "false parameter" have the userlevel=1 option which allows them to not be seen in the controller. In fact it is an attribute that only acts as a messenger between the list_outputs() method and the run_process_mia() method to pass the database parameters needed to the running process. It seems to do the job. I'll just have to see if it can't be indexed in the database for history (I think it shouldn't be too complicated to do) since it's a fake parameter (I don't know what else to call this parameter ... "hidden parameter"?).
By the way there are other parameters, with a userlevel>0 that should not appear in the history and that also appear today (#273 ticket).
Defining a traits for this "fake parameter" that is only used as a messenger for the runtime is quite cumbersome (but seems to do the job), so I was asking if there was a more elegant way to do the job.
Ok, after some changes in populse_mia and looking at what's going on, I think we can keep the idea of the fake
(or hidden
? or messenger
?, I'm not sure what to call it!) dict4runtime parameter as a messenger between the initialisation step with database access and the runtime.
dict4runtime is not observed in the controller GUI: OK
However, it is observed in the history. I don't really know if this is a bad thing (why not have access to all parameters if we want see the history, as we may do this for a debugging reason).
There is just an exception thrown with this fake
parameter in the capsul.pipeline.xml module (SyntaxError: invalid syntax) that prevents it from being saved because the value is (for example) {'patient_name': 'alej', 'study_name': 'MRI Fonct. + perfusion', 'acquisition_date' : '2016-03-17 08:34:44', 'sex' : <undefined>, 'site' : 'Grenoble University Hospital - CLUNI', 'mri_scanner' : 'Philips Achieva 3.0T TX', 'age' : <undefined>}
(it seems that Undefined is badly handled here!).
It doesn't seem to disrupt the operation of the brick with this fake
parameter!
I think this ticket can be closed now.
I don't know where to put this ticket, it comes from observations while working in the mia_processes package, it certainly has involvement in capsul, populse_mia, soma-workflow...... I put it in populse_mia by default.
self
object in thelist_outputs()
andrun_process_mia()
methods of a mia_processes class, (e.g. the mia_processes.bricks.preprocess.spm.Smooth()), are not the same (not the same memory address).However, in the
self
object of therun_process_mia()
method we find the essential elements to start the run:self.process
if it has been defined in the constructor and the different parameters of the plug nodes (process parameters) when the traits have also been defined in the constructor.But :
self
object in thelist_outputs()
method contains the project object, which allows interaction with the database when this method is crossed. Theself
object in therun_process_mia()
method does not contain the project object, so it is not possible to interact with the database when running with this method.One might think that there is no point in interacting with the database at runtime as the indexing is currently done before runtime at initialisation time (however, discussions have already pointed that this is not a good idea and that it would be better to do the database indexing at runtime to avoid data in the database that doesn't exist yet! ), One solution would be to set some parameters for runtime, in the
list_outputs()
method. This would solve the problem of accessing the database, for example.self
object, defined inlist_outputs()
are not accessible inrun_process_mia()
. For example self.foo = 'foo' instantiation inlist_outputs()
then inrun_process_mia()
self.foo is not existing.I think there would be a way to predefine traits, at constructor time, for "attributes for passing from
list_outputs()
torun_process_mia()
", but this is quite cumbersome and I have not yet tested whether, without making any code changes, these attributes would not be indexed in the history.In short, although the
run_process_mia()
self
is not initially intended to hold the project object, it would simplify things if it were present. We would need to find inrun_process_mia()
the attributes defined inlist_outputs()
.I'm starting to work on the automatic pdf report generation bricks (so that at the end of the run the user can consult a pdf file with a summary of the results rather than having to look for the data in the database or on the computer) and the current features described here make it quite difficult to code these bricks ...