populse / capsul

Collaborative Analysis Platform : Simple, Unifying, Lean
Other
7 stars 14 forks source link

How to generate filenames for intermediate / temporary files in pipelines? #247

Closed ylep closed 1 year ago

ylep commented 2 years ago

I am trying to implement a simple use-case, with a pipeline that has two processes, an intermediate file being generated by the first process, and read by the second process (output_image of the first process is used as input_image of the second process): filtered_sumcurvs_pipeline

When I run this process I get an error, because smoothing.output_image is unset. Indeed, it is in fact an input parameter of the first smoothing process (File(output=False, write=True)), being connected to an input parameter of the second sumcurvs process.

In this case, the right thing to do is to generate an arbitrary filename in a temporary directory, and have it cleaned up after execution of the processes. Is there such a mechanism built into Capsul, or planned to be coded? Design-wise, I am not sure what is the right thing to do in the general case... I am happy to take part in further discussions about this issue.

(Issue originally reported as https://github.com/populse/capsul/issues/241 but I misinterpreted the error, so here is a fresh issue with the correct question asked :slightly_smiling_face:)

denisri commented 2 years ago

Indeed, this is the trypical case for temporary files. Your pipeline seems perfectly correct. Normally they are handled by Capsul, but I haven't really tested them yet in Capsul v3. But @sapetnioc has written code to handle them in principle, so there is probably a bug. I'm sorry I'm going on vacation in an hour, so I can't help more until... last week of august... :)

sapetnioc commented 1 year ago

I recently completely re-implemented the temporary file detection for v3 because it was bugged (especially for iterations). This is still in the branch test_morphologist but will be merged soon. The new algorithm to detect parameters that need a temporary file name generation can be summarized like this:

The result of this algo is to set a generate_temporary flag to all parameters. If the value is True at the end, a temporary file name will be generated.

Temporary files do not depend only on pipeline structure but also on parameters. When there is a switch, its parameter can change the temporary files that must be created. Therefore, in iterations, it is necessary to iterate over all the parameters and to use the algorithm each time.

As a consequence, the creation of temporary file names before execution is incompatible with fully dynamic pipelines. We may have to change that later but it is complex and could require major modifications in the engine machinery.