sphuber / aiida-shell

AiiDA plugin that makes running shell commands easy.
MIT License
14 stars 7 forks source link

How to use shell keywords in command? #89

Closed bilke closed 4 months ago

bilke commented 5 months ago

Maybe this comes from aiida-core but why is the command and the arguments quoted when run via launch_shell_job?

I would like to do the following:

for i in some/dir/*.*; do sha1sum "$i" ; done

So my shell job looks like this (where the command is a shell keyword):

results, hash_node = launch_shell_job(
    "for",
    arguments=f"i in some/dir/*.*; do sha1sum \"$i\"; done",
    resolve_command=False,
    metadata={
        "options": {
            "computer": node.computer,
        },
    },
)

And in principle my _aiidasubmit.sh looks good, but as every argument is quoted including the command 'for' 'i' 'in' ... it fails with:

The command exited with a non-zero status: 127 _aiidasubmit.sh: line 6: for: command not found

What's the best way to run such commands in a launch_shell_job? Thanks a lot!

sphuber commented 5 months ago

Maybe this comes from aiida-core but why is the command and the arguments quoted when run via launch_shell_job?

This is because the fundamental assumption in AiiDA of what a "job" is, is a single command execution (a binary or a shell command) with some command line arguments that is executed in bash. Since the command line arguments could contain characters that have special meaning in bash, the arguments are escaped and quoted (for example such that arguments with spaces in them are still interpreted as individual arguments and not as multiple).

This then shows why you are having trouble with your example because your are treating the for statement as an executable or command where it really is just a bash-specific keyword.

I see two alternatives:

That being said, did you simplify this script just for the purpose of the example? Or is your goal really just to compute the sha1 checksum of a number of files in a directory? Because if that is the case, why not just do this in Python?

import hashlib
for filepath in pathlib.Path('somedir').glob('*.*'):
    hash = hashlib.sha1(filepath.read_bytes()).hexdigest()
bilke commented 4 months ago

Thanks again for your helpful explanations! The second alternative with the SinglefileData.from_string() is really elegant!

I am doing it as a launch_shell_job because the files are on remote machines and might be large so I want to avoid transferring and also avoid storing the files in the AiiDA database / repo.

sphuber commented 4 months ago

Thanks again for your helpful explanations! The second alternative with the SinglefileData.from_string() is really elegant!

I am doing it as a launch_shell_job because the files are on remote machines and might be large so I want to avoid transferring and also avoid storing the files in the AiiDA database / repo.

Glad it is useful. One final thing to perhaps take into account: you are essentially breaking provenance because you are using the glob on a remote directory. Using this approach, it is not guaranteed that the same job will work exactly the same when executed against a different remote computer. And even on the same computer, the results may be different if you run it again and the contents of some/dir have changed. But that is up to you to decide whether that is of importance for your use case.

bilke commented 4 months ago

Provenance is somehow preserved as I glob a RemoteData:

@calcfunction
def create_hash_script(remote_folder):
    return SinglefileData.from_string(
        f'cd {remote_folder.get_remote_path()}/out\n'
        f'for i in *.*; do sha1sum "$i" ; done')

...

results, hash_node = launch_shell_job(
    'bash',
    arguments='{script}',
    nodes={
        'script': create_hash_script(node.outputs.remote_folder)
    },
    parser=hash_parser,
    metadata={
        "options": {
            "computer": node.computer,
            "withmpi": False,
        },
    },
)