populse / capsul

Collaborative Analysis Platform : Simple, Unifying, Lean
Other
7 stars 14 forks source link

Should xml_process decorator be kept ? #189

Closed sapetnioc closed 2 years ago

sapetnioc commented 3 years ago

We have the possibility to define processes as a function and a decorator containing an XML string. Do we want to keep this in a future major version change ?

I need to know because I am working on an experimental rewriting of Controllers, Processes and Pipelines to build a JSON/REST API service for managing processes execution. To assess the potential impact of these changes, I need to know if we can drop some features.

One of the question underlying this issue is "what is a process ?". For me a process is not a Python function. It is a more complex object meant to be used as part of a Pipeline or to be run by a dedicated GUI or platform. Moreover, in my opinion, using XML in Python code to represent a process is ugly. In Python, I believe that we should only support using a Process derived class.

servoz commented 3 years ago

I don't think I've ever used a XML string for a process definition in populse... For me, there is a good chance that this can be removed without any problem.

denisri commented 2 years ago

We have never used this feature, it has been designed and used years ago by people who have left the project. XML may be a correct save/reload format, but in my opinion it's not really human-readable thus is not convenient to be coded by hand in function decorators. The ability to use functions as processes can be discussed - this is something we used to like to be possible, but I don't think we ever actually used it. Anyway if we decide to support it (or "still" support it) we should rather find a better (I mean non-XML) way of declaring parameters. Thus I don't mind dropping it for now.

sapetnioc commented 2 years ago

It was easy to keep the functionality replacing XML by Python 3 builtin typing features. For instance, the following declaration in a module named my_module:

import os
from soma.controller import file

def my_process(image: file(), parameter: float) -> file(write=True):
    with open(image) as r:
        ...
    f, e = os.path.splitext(image)
    output = f'{f}_processed{e}'
    with open(output, 'w') as w:
        ...
    return output

allows to get a Process with get_process_instance('my_module.my_process') (there is a single output parameter called result in that case). But direct usage of the function is possible since it is untouched (no decorator).

It is not straightforward to have a code that is compatible for a direct use in Python and an indirect use with Capsul. In my example, output file name is generated in the process (though it could be a parameter of the function to allow to use a path finding system). Moreover, most processes will need to access a kind of context object given by Capsul and/or Soma-Workflow. But this context will probably be given by environment variables and/or files (things that can always be transfered to a job in all processing systems). Therefore I think it could be possible. My current challenge is to keep simple the configuration of a local or remote execution environment.