[Iteration enhancement] In Data Browser, for Bricks tag (history of the pipeline), Exec and Exec Time are wrong

servoz commented 3 years ago

Observed in the 3 iteration modes described in the documentation.

After initialisation then run also, in Data Browser and for the Bricks tag (history of the pipeline), the Exec tag remains at "Not done" and the Exec Time tag has no value at the end of the run.

LStruber commented 2 years ago

Working on this issue, I "remodeled" the initialization to fill the database and get all inputs/outputs with a workflow object that is created using workflow_from_pipeline from capsul.pipeline.pipeline_workflow. As a reminder, this was a suggestion from @denisri in #217, to avoid what is currently done in MIA, i.e. generating a process_order that is mimicking the workflow.

Good news, this seems to work to solve #217 as much as my "quick and dirty" fix I've done ! Furthermore, it allows to solve #242 easily as a side effect. I had hope that it will solve this issue. It is almost the case, but not quite. Let me explain:

As you all know, we currently create all path of files that will be generated by the Run during the initialization. After initialization all "not run" entries of the database have a tag Exec set at "Not Done", and an Exec Time with no value. After the run, we want to update these entries with current time for Exec Time and setting Exec as done. In order to retrieve these entries in the database, we use their uuid that are generated by the workflow_from_pipeline function. Here is the issue:

On a non-iterated pipeline, you can call workflow_from_pipeline as many times as you want, the jobs uuid will remain the same each time. This in NOT THE CASE on an iterated_pipeline ! Calling workflow_from_pipeline seems to randomly create uuid for each Job, then if you call workflow_from_pipeline a second time, jobs uuid won't be the same !

By calling workflow_from_pipeline once in initialization and once in start function from WorkflowExecutionError class, we prevent correct update of the database after the run (in iterated pipeline) because we can't retrieve the same jobs uuid that were generated during initialization.

I tested a quick fix to resolve this issue: when starting running thread, I pass the workflow created during the initialization to the engine (in run()function of RunWorker in pipeline_manager_tab.py):

exec_id, pipeline = engine.start(pipeline, workflow=self.pipeline_manager.workflow, get_pipeline=True)

As a consequence I had to modify start function of CapsulEngine class (capsul/engine/init.py):

def start(self, process, workflow=None, history=True, get_pipeline=False, **kwargs):
  return run.start(self, process, workflow, history, get_pipeline, **kwargs)

And start function of WorkflowExecutionErrorclass (capsul/engine/run.py):

def start(engine, process, workflow=None, history=True, get_pipeline=False, **kwargs):
  if workflow is None:
    workflow = workflow_from_pipeline(process)

This fix works and solve current issue. However it implies a modification in Capsul and not only in MIA. @denisri, @sapetnioc, is this a feasible modification in Capsul (I do not know Capsul and maybe it could interfer with other stuff) ? Should I create a pull request ?

servoz commented 2 years ago

If you don't currently have feedback from capsule side, do you think it will be a big job to overload in mia with what you need from capsul? This could be a temporary patch to avoid getting stuck (if the patch is not too long to implement!).

denisri commented 2 years ago

Hi, sorry again for the delay, I have too many things to do at the same time. Yes uuids cannot be the same between two calls of workflow_from_pipeline, because it's the same instances of processes that are reused for iterations, and they must be assigned one uuid per run, not per instance. So the solution you used to keep the workflow and use it, both solves the uuid issue, and avoids calling workflow_from_pipeline twice. It could appear a bit weird to call engine.start() with both the pipeline and the workflow (and we cannot really check they are consistent) but as it is useful, I agree on it. Go for the PR, yes, thanks !

LStruber commented 2 years ago

closed with commit b9053ca9de5d32287bed863506d29e07eab93237

populse / populse_mia

[Iteration enhancement] In Data Browser, for Bricks tag (history of the pipeline), Exec and Exec Time are wrong #218