Closed sapetnioc closed 2 years ago
I don't think I've ever used a XML string for a process definition in populse... For me, there is a good chance that this can be removed without any problem.
We have never used this feature, it has been designed and used years ago by people who have left the project. XML may be a correct save/reload format, but in my opinion it's not really human-readable thus is not convenient to be coded by hand in function decorators. The ability to use functions as processes can be discussed - this is something we used to like to be possible, but I don't think we ever actually used it. Anyway if we decide to support it (or "still" support it) we should rather find a better (I mean non-XML) way of declaring parameters. Thus I don't mind dropping it for now.
It was easy to keep the functionality replacing XML by Python 3 builtin typing features. For instance, the following declaration in a module named my_module
:
import os
from soma.controller import file
def my_process(image: file(), parameter: float) -> file(write=True):
with open(image) as r:
...
f, e = os.path.splitext(image)
output = f'{f}_processed{e}'
with open(output, 'w') as w:
...
return output
allows to get a Process with get_process_instance('my_module.my_process')
(there is a single output parameter called result
in that case). But direct usage of the function is possible since it is untouched (no decorator).
It is not straightforward to have a code that is compatible for a direct use in Python and an indirect use with Capsul. In my example, output file name is generated in the process (though it could be a parameter of the function to allow to use a path finding system). Moreover, most processes will need to access a kind of context object given by Capsul and/or Soma-Workflow. But this context will probably be given by environment variables and/or files (things that can always be transfered to a job in all processing systems). Therefore I think it could be possible. My current challenge is to keep simple the configuration of a local or remote execution environment.
We have the possibility to define processes as a function and a decorator containing an XML string. Do we want to keep this in a future major version change ?
I need to know because I am working on an experimental rewriting of Controllers, Processes and Pipelines to build a JSON/REST API service for managing processes execution. To assess the potential impact of these changes, I need to know if we can drop some features.
One of the question underlying this issue is "what is a process ?". For me a process is not a Python function. It is a more complex object meant to be used as part of a Pipeline or to be run by a dedicated GUI or platform. Moreover, in my opinion, using XML in Python code to represent a process is ugly. In Python, I believe that we should only support using a Process derived class.