[initialisation step] too much writing to standard output when initialising an iterated pipeline?

servoz commented 2 years ago

Without having tried to understand at all yet! (So maybe I'm wrong): Minimum procedure for reproducing:

From a database with three patients (each patient has one anat and one functional).
Take the spatial_preprocessing_1 brick from the mia_processes library.
Follow the procedure for iteration as described in the documentation (Via Input_Filter brick/process - With use of the iteration table)
Setup the iteration on only 2 patients, using in the iteration table the PatientName tag and the Filter button.
Launch the iteration step.

We observe the following stdout:

- Pipeline initialising ...

Completion ...

. Iteration_pipeline.anat_file_filter (mia_processes.bricks.tools.tools.Input_Filter) MIA node ...

During the <mia_processes.bricks.tools.tools.Input_Filter object at 0x7fda2b4c6bf8> process initialisation, some possible problems were detected:
- inheritance_dict attribute was not found ...

. Iteration_pipeline.func_files_filter (mia_processes.bricks.tools.tools.Input_Filter) MIA node ...

During the <mia_processes.bricks.tools.tools.Input_Filter object at 0x7fda2b4e5150> process initialisation, some possible problems were detected:
- inheritance_dict attribute was not found ...

. spatial_preprocessing_1_1.list_duplicate1 (mia_processes.bricks.tools.tools.List_Duplicate) MIA node ...

During the <mia_processes.bricks.tools.tools.List_Duplicate object at 0x7fda63226830> process initialisation, some possible problems were detected:
- inheritance_dict attribute was not found ...

. spatial_preprocessing_1_1.newsegment1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.NewSegment) MIA node ...

. spatial_preprocessing_1_1.realign1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Realign) MIA node ...

. spatial_preprocessing_1_1.normalize12_1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.coregister1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Coregister) MIA node ...

. spatial_preprocessing_1_1.normalize12_2 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.smooth1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Smooth) MIA node ...

. spatial_preprocessing_1_1.list_duplicate1 (mia_processes.bricks.tools.tools.List_Duplicate) MIA node ...

During the <mia_processes.bricks.tools.tools.List_Duplicate object at 0x7fda63226830> process initialisation, some possible problems were detected:
- inheritance_dict attribute was not found ...

. spatial_preprocessing_1_1.newsegment1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.NewSegment) MIA node ...

. spatial_preprocessing_1_1.realign1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Realign) MIA node ...

. spatial_preprocessing_1_1.normalize12_1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.coregister1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Coregister) MIA node ...

. spatial_preprocessing_1_1.normalize12_2 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.smooth1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Smooth) MIA node ...

Completion done.

. spatial_preprocessing_1_1.list_duplicate1 (mia_processes.bricks.tools.tools.List_Duplicate) MIA node ...

During the <mia_processes.bricks.tools.tools.List_Duplicate object at 0x7fda63226830> process initialisation, some possible problems were detected:
- inheritance_dict attribute was not found ...

. spatial_preprocessing_1_1.newsegment1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.NewSegment) MIA node ...

. spatial_preprocessing_1_1.realign1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Realign) MIA node ...

. spatial_preprocessing_1_1.normalize12_1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.coregister1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Coregister) MIA node ...

. spatial_preprocessing_1_1.normalize12_2 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.smooth1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Smooth) MIA node ...

. spatial_preprocessing_1_1.list_duplicate1 (mia_processes.bricks.tools.tools.List_Duplicate) MIA node ...

During the <mia_processes.bricks.tools.tools.List_Duplicate object at 0x7fda63226830> process initialisation, some possible problems were detected:
- inheritance_dict attribute was not found ...

. spatial_preprocessing_1_1.newsegment1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.NewSegment) MIA node ...

. spatial_preprocessing_1_1.realign1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Realign) MIA node ...

. spatial_preprocessing_1_1.normalize12_1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.coregister1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Coregister) MIA node ...

. spatial_preprocessing_1_1.normalize12_2 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Normalize12) MIA node ...

. spatial_preprocessing_1_1.smooth1 (mia_processes.bricks.preprocess.spm.spatial_preprocessing.Smooth) MIA node ...

As I wrote at the beginning of this ticket I have not investigated at all, but at first sight I expect in this case to have an initialisation report for only two pass through the pipeline (since we are iterating only on two patients)? Here the initialisation report in stdout seems to indicate 4 pipeline pass. The result is correct (i/o indexing in database, so I think it's just a problem of poorly (or due to changes in last commits, not correctly) managed prints...

LStruber commented 2 years ago

This is due to recent commits introducing workflow_from_pipeline function in intialization. This function (re)performs completion of iterated pipelines. Then standard output is printed twice... However 'workflow_from_pipeline()' does not perform any completion on non-iterated bricks/pipeline.

I guess we should:

check if completion done in workflow_from_pipeline is sufficient to ignore the first call of complete_pipeline_parameters inside initialization
adapt workflow_from_pipeline function to ensure that completion is also done on non-iterated pipelines

LStruber commented 2 years ago

As a quick fix we could add a boolean parameter do_completion in workflow_from_pipeline function to choose if the function should perform completion (as there is a check_requirements parameter) and set it to False in MIA.

Btw I think we should set check_requirements parameter to False, in order not to check requirements twice (it seems to be currently performed twice also)

denisri commented 2 years ago

Yes that's exactly what I was about to propose: workflow_from_pipeline does not perform completion by default, because it's not its job, and should probably not do it if it has already been done (completion may be expensive), but it needs to do it for iterations anyway. So the option is the best solution.

denisri commented 2 years ago

I'll modify workflow_from_pipeline (unless you have already started it ?)

LStruber commented 2 years ago

Nop, you can do it

LStruber commented 2 years ago

What do you mean by:

it needs to do it for iterations anyway

The option will disabled completion for iterated pipelines too, right ?

denisri commented 2 years ago

The option will disabled completion for iterated pipelines too, right ?

No, it cannot always work without it. Well I have answered too fast, the situation is (as always) more complex than it appears... In iterations, each iteration step needs a completion for itself: typically input values change at each iteration for the iterated process or pipeline, and its other parameters need to be updated accordingly, otherwise some parameters will not follow the iterations and be left to the wrong values. So we cannot disable this.

On the other hand, we could think that, in a well written pipeline, the parameters of iterated nodes should all be set when performing the main pipeline completion (each parameter gets a list of values, which could be reused to build the workflow). However many pipelines happen to have internal parameters which are not exported, and thus would not be set at all when they are iterated. Historically workflow_from_pipeline did not perform completion, and bad things happened (like wrong files overwritten by other iterations), so we had to add it. This is probably a bad practice to have internal parameters not seen from outside of the pipeline, but it happens: some pipelines work as kind of "black boxes", and they would have many many parameters if they did expose them all.

So performing completion inside workflow_from_pipeline would not prevent multiple completions of iterated processes/pipelines (so it would not change anything to the problem here), unless we can disable iteration completion during the parent pipeline completion - but things have not been designed this way... I'm not sure it's easy to change...

:/

LStruber commented 2 years ago

Then would it be possible to perform:

either completion of non-iterated pipelines in workflow_from_pipeline function,
or only a "simple completion" (not going inside iterated node) in MIA, before calling workflow_from_pipeline ?

EDIT: I just understand that it's more or less what you suggested in your last sentence. Then the only way to deal with this double completion would be to write a function that perform completion only on parent pipeline (and call it either in workflow_from_pipeline, either in MIA directly)

LStruber commented 2 years ago

I guess this issue is sort of duplicate of #220 as the completion that was performed before run was certainly due to the call of workflow_from_pipeline that was done before run and has now been moved to initialisation

denisri commented 2 years ago

Then would it be possible to perform:
1. either completion of non-iterated pipelines in `workflow_from_pipeline` function,

2. or only a "simple completion" (not going inside iterated node) in MIA, before calling `workflow_from_pipeline`
   ?
EDIT: I just understand that it's more or less what you suggested in your last sentence. Then the only way to deal with this double completion would be to write a function that perform completion only on parent pipeline (and call it either in workflow_from_pipeline, either in MIA directly)

Yes you completely get it (I was not sure to be clear). I'll see it the "simple completion" is feasible.

Thus #220 should be the same thing, yes, but it has to be checked too.

denisri commented 2 years ago

The simple solution is to add an optional parameter like ProcessCompletionEngine.complete_parameters(complete_iterations=True). It's not very beautiful but as it may be useful for any pipeline which will be converted into a workflow (which is the aim of the pipeline thing), I'll do this way.

denisri commented 2 years ago

I have done the modif, but each node still seems to be completed several times...

LStruber commented 2 years ago

didn't you miss something in completion_engine.py in the complete_parameters() function ? I feel like there should be somewhere a if complete_iterations: [...]

denisri commented 2 years ago

Yes of course, you're smart ! ;) And I'm kind of silly... I should not do 4 different things at the same time... I paid attention to correctly pass the new parameter across nodes and classes inheritances trees, but at the end I didn't do anything with it :)

servoz commented 2 years ago

Yes of course, you're smart ! ;) And I'm kind of silly... I should not do 4 different things at the same time... I paid attention to correctly pass the new parameter across nodes and classes inheritances trees, but at the end I didn't do anything with it :)

:-)))) 4 things is not bad ... I have trouble starting from one !

denisri commented 2 years ago

Well, now it implies modifications inside workflow_from_pipeline because it was expecting completion to have already be done once in order to 1. assign temporary filenames to all internal file parameters (output connected to another input, with no value determined), 2. determine the number of iterations and check if the iteration is static or dynamic (the latter case is not handled and produces an error)

denisri commented 2 years ago

Well, this modification is not trivial at all. I get headaches... I have made changes (still uncommited) that more or less do the job, but I suspect some cases are not taken into account any longer (more permissive iteration, no support for mixed static/dynamic outputs in iterations, and I suspect temporary files handling and output links are broken). I have to write a set of test cases which did not exist so far for the iterations workflows. I will need probably several days to do it (if no other urgent jobs pops up...).

denisri commented 2 years ago

I pushed a new branch: https://github.com/populse/capsul/tree/workflow_rework which perhaps fixes this issue. Tests pass in Capsul (including a couple of new tests for iteration and completion in workflows) but I'm not totally sure all the use cases we want are covered by tests...

denisri commented 2 years ago

The branch has been merged into capsul/master since tests pass.

servoz commented 2 years ago

I just tested and thanks to the changes made by @LStruber and @denisri it seems that all are now working as we want (at least for the issues discussed in this ticket). The issue of ticket #220 seems to be solved too. I think we can close these two tickets!

populse / populse_mia

[initialisation step] too much writing to standard output when initialising an iterated pipeline? #251