Closed superstar54 closed 2 months ago
Thanks for the PR @superstar54 . Why does aiida-worktree
need to have access to an explicit output port? Does it actually get some data from the Port
instance that is returned? Or is it just for validation, making sure that the output port exists?
Because if it is the latter, you don't actually need the explicit output port to be defined if the parent namespace is dynamic, which is the case for the top-level output namespace of ShellJob
.
The reason I didn't explicitly declare the stdout
and stderr
output ports is because it is indeed not guaranteed that they will exist. Now you could say that is why required=False
exists, but I find it a bit superfluous in this case. Especially since the name of the output that contains the stdout
can actually change. You can specify the filename where stdout should be redirected to with the metadata.options.output_filename
option. This would change your example to:
job1 = wt.nodes.new(shelljob, code=pdb_fetch, arguments=["1brs"], metadata={'options': {'output_filename': 'custom_stdout'}})
wt.links.new(job1.outputs["custom_stdout"], generate_nodes1.inputs[0])
But this would then fail if I understand you correctly, because aiida-worktree
will try to access ShellJob.spec().outputs['custom_stdout']
, which doesn't exist. Correct?
Thanks for the quick reply! WorkTree needs explicit ports to link sockets between nodes.
In the case of user-defined output_filename
,
In worktree, one needs to explicitly add the port output_filename
so that later, one can link the output to the input of another node.
job1 = wt.nodes.new(shelljob, code=pdb_fetch, arguments=["1brs"])
job1.outputs.new("General", "custom_stdout")
wt.links.new(job1.outputs["custom_stdout"], generate_nodes1.inputs[0])
If the ShellJob has a default explicit output stdout
, in most cases (without user defined output_filename
), the user doesn't need to add the output socket anymore, thus making the code short and clear.
If the ShellJob has a default explicit output
stdout
, in most cases (without user definedoutput_filename
), the user doesn't need to add the output socket anymore, thus making the code short and clear.
I understand, but can you link me to the code in aiida-worktree
where it uses the output port from the spec so I can see what it actually uses it for. What information do you get from the output port definition? Is it just to validate that it exists?
What information do you get from the output port definition? Is it just to validate that it exists?
The code to read the output port is here. It uses the ports to create the output sockets in the Node
. The Node
has a create_sockets
method to define input and output sockets explicitly. Then, one creates a link using links.new(output, input)
.
Currently, it is impossible to create a link if a socket does not exist. I created an issue to support dynamic sockets. But I may not add this feature. Because, in this case, one can not link the nodes in GUI in the future.
Instead of building the node from CalcJob
by using the input and output ports, I can also create the AiiDAShell
node manually:
shelljob_inputs = [
'metadata',
'metadata.store_provenance',
'metadata.description',
'metadata.label',
'metadata.call_link_label',
'metadata.dry_run',
'metadata.computer',
'metadata.options',
'metadata.options.input_filename',
'metadata.options.output_filename',
'metadata.options.submit_script_filename',
'metadata.options.scheduler_stdout',
'metadata.options.scheduler_stderr',
'metadata.options.resources',
'metadata.options.max_wallclock_seconds',
'metadata.options.custom_scheduler_commands',
'metadata.options.queue_name',
'metadata.options.rerunnable',
'metadata.options.account',
'metadata.options.qos',
'metadata.options.withmpi',
'metadata.options.mpirun_extra_params',
'metadata.options.import_sys_environment',
'metadata.options.environment_variables',
'metadata.options.environment_variables_double_quotes',
'metadata.options.priority',
'metadata.options.max_memory_kb',
'metadata.options.prepend_text',
'metadata.options.append_text',
'metadata.options.parser_name',
'metadata.options.additional_retrieve_list',
'metadata.options.stash',
'metadata.options.stash.target_base',
'metadata.options.stash.source_list',
'metadata.options.stash.stash_mode',
'metadata.options.redirect_stderr',
'metadata.options.filename_stdin',
'metadata.options.additional_retrieve',
'metadata.options.use_symlinks',
'code',
'monitors',
'remote_folder',
'nodes',
'filenames',
'arguments',
'outputs',
'parser',
]
# here I can add the `stdout` and `stderr`.
shelljob_outputs = ['remote_folder',
'remote_stash',
'retrieved',
'stdout',
'stderr',
]
class AiiDAShell(Node):
"""AiiDAShell"""
identifier = "AiiDAShell"
name = "AiiDAShell"
node_type = "calcjob"
catalog = "AiiDA"
kwargs = shelljob_inputs
def create_sockets(self):
self.inputs.clear()
self.outputs.clear()
for inp in shelljob_inputs:
self.inputs.new("General", inp)
for out in shelljob_outputs:
self.outputs.new("General", out)
def get_executor(self):
return {
"path": "aiida_shell.calculations.shell",
"name": "ShellJob",
}
In this case, I add the AiiDAShell
node as a built-in node. Then, one can use this node with its identifier (like entry point) AiiDAShell
.
job1 = wt.nodes.new("AiiDAShell", code=pdb_fetch, arguments=["1brs"])
In this case, this PR is not necessary.
So if I understand correctly, without this PR, when a user does:
job1 = wt.nodes.new(shelljob, code=pdb_fetch, arguments=["1brs"])
wt.links.new(job1.outputs["stdout"], generate_nodes1.inputs[0])
the job1.outputs["stdout"]
will except, because when job1
was created, it didn't explicitly add the stdout
key (since ShellJob.spec().outputs
does not define it). How about, though, if you override the __getitem__
method of job1.outputs
that will catch a KeyError
if an output is referenced that is not explicitly defined. In that case, you check that the output spec of the linked process class is dynamic. In that case, you just manually add that output port and return it. If the output namespace is not dynamic, you reraise the exception.
Dynamic output specs are quite common, so I am wondering how you normally deal with this? I don't think it would be viable to require that all processes need to explicitly define output ports. Right?
@superstar54 just a quick ping to remind you of my last comment. Would that solution not work?
Hi @sphuber , thanks for your comment and suggestion. Allowing dynamic addition of output ports based on undefined keys could lead to errors if a user provides an incorrect output name.
For the moment, WorkGraph has manually added the stdout
and stderr
outputs for the built-in ShellTask
, to ensure these outputs are explicitly defined. But as you suggested, WorkGraph should handle the case when user specifies the `output_filename'.
Please close this PR if you don't plan to implement it.
Add the
stdout
andstderr
outputs. I set them asrequired=False
, because thestdout
will not be out if the user setoutput_filename
. And thestderr
could not exist.The reason I want to add this.
AiiDA-shell provides great flexibility to the user. It lowers the barrier for people to use AiiDA. One can combine it with AiiDA-WorkTree, to provide more power. What I mean by saying that is, instead of running the calcjobs in a python script, such as:
In this case, there is no checkpoint. One can use WorkTree to link all these calcjobs by passing the output of one ShellJob to another.
In the aiida-worktree, one creates a node from calcjob directly, and use the calcjob's inputs and outputs.
For the moment, I need to add a new output socket everytime I creating a shell node.
With this PR, one can
A full example is
This is node graph: