Closed bilke closed 4 months ago
Maybe this comes from aiida-core but why is the command and the
arguments
quoted when run vialaunch_shell_job
?
This is because the fundamental assumption in AiiDA of what a "job" is, is a single command execution (a binary or a shell command) with some command line arguments that is executed in bash. Since the command line arguments could contain characters that have special meaning in bash, the arguments are escaped and quoted (for example such that arguments with spaces in them are still interpreted as individual arguments and not as multiple).
This then shows why you are having trouble with your example because your are treating the for
statement as an executable or command where it really is just a bash-specific keyword.
I see two alternatives:
sha1sum
launch_shell_job(
'bash',
arguments='{script}',
nodes={
'script': SinglefileData.from_string('for i in some/dir/*.*; do sha1sum "$i" ; done')
}
)
That being said, did you simplify this script just for the purpose of the example? Or is your goal really just to compute the sha1 checksum of a number of files in a directory? Because if that is the case, why not just do this in Python?
import hashlib
for filepath in pathlib.Path('somedir').glob('*.*'):
hash = hashlib.sha1(filepath.read_bytes()).hexdigest()
Thanks again for your helpful explanations! The second alternative with the SinglefileData.from_string()
is really elegant!
I am doing it as a launch_shell_job
because the files are on remote machines and might be large so I want to avoid transferring and also avoid storing the files in the AiiDA database / repo.
Thanks again for your helpful explanations! The second alternative with the
SinglefileData.from_string()
is really elegant!I am doing it as a
launch_shell_job
because the files are on remote machines and might be large so I want to avoid transferring and also avoid storing the files in the AiiDA database / repo.
Glad it is useful. One final thing to perhaps take into account: you are essentially breaking provenance because you are using the glob on a remote directory. Using this approach, it is not guaranteed that the same job will work exactly the same when executed against a different remote computer. And even on the same computer, the results may be different if you run it again and the contents of some/dir
have changed. But that is up to you to decide whether that is of importance for your use case.
Provenance is somehow preserved as I glob a RemoteData
:
@calcfunction
def create_hash_script(remote_folder):
return SinglefileData.from_string(
f'cd {remote_folder.get_remote_path()}/out\n'
f'for i in *.*; do sha1sum "$i" ; done')
...
results, hash_node = launch_shell_job(
'bash',
arguments='{script}',
nodes={
'script': create_hash_script(node.outputs.remote_folder)
},
parser=hash_parser,
metadata={
"options": {
"computer": node.computer,
"withmpi": False,
},
},
)
Maybe this comes from aiida-core but why is the command and the
arguments
quoted when run vialaunch_shell_job
?I would like to do the following:
So my shell job looks like this (where the command is a shell keyword):
And in principle my
_aiidasubmit.sh
looks good, but as every argument is quoted including the command'for' 'i' 'in' ...
it fails with:What's the best way to run such commands in a
launch_shell_job
? Thanks a lot!