sphuber / aiida-shell

AiiDA plugin that makes running shell commands easy.
MIT License
14 stars 7 forks source link

Add support for addtional node types besides `SinglefileData` as input nodes #4

Closed sphuber closed 2 years ago

sphuber commented 2 years ago

The current ShellJob interface allows SinglefileData nodes to be passed as inputs such that the provenance is kept. The plugin will automatically write the content of the files to the working directory. However, one might want to pass other node types as inputs as well.

Imagine the use case where a first shell command produces an output file with a single integer and this integer should be used as a command line argument for the second command. Currently, one would have to parse the integer and cast it to str before adding it in the arguments input. But this would lose the provenance. If one would be interested to keep the provenance, it should be possible to parse the output file using a calcfunction and pass its output straight to the ShellJob.

Node types that can be automatically supported are those where it makes sense that their content can be serialized to a string that will be directly used as a command line argument. Examples are the Float, Int and Str nodes. They all have the .value property that could be used. The question would be what to do when an input node is received that cannot be straightforwardly converted to a str and whether a more specialized method should be defined that node types should implement in order to be compatible with the ShellJob.

Example:

results = launch_shell_job(
    'echo',
    arguments=['2']
)

@calcfunction
def parse(output):
    return Int(int(output.get_content()))

results = launch_shell_job(
    'head',
    arguments=['-n', '{nlines}', '{single_file}']
    inputs={'nlines': parse(results['stdout'])}
    files={'single_file': SinglefileData(io.StringIO('1\n2\n3\n4\n5'))}
)

By adding a new keyword inputs that accepts any Node that implements the value property that serializes the content to a single str argument, the provenance between the two steps and the intermediate Int node can be kept.

The parsing needs to be done using a calcfunction of course to keep the provenance between the output of the first command and the Int node that is derived from it. Really, this should be done in the parsing of the first command. This would be another potential feature where it would be possible to add custom parsing on the file. This would allow to have the echo shell job return the output directly as an Int instead of a SinglefileData.

sphuber commented 2 years ago

Fixed in f6dbfa0373a37df5ef776a91442bcc4ba6c6fdc5