sphuber / aiida-shell

AiiDA plugin that makes running shell commands easy.
MIT License
14 stars 7 forks source link

Proposal on handling `RemoteData` #105

Open yakutovicha opened 1 week ago

yakutovicha commented 1 week ago

The current behaviour w.r.t. RemoteData object is to copy/symlink all its content in the current folder. But this creates an issue of handling files with the same names (e.g. #58). It becomes especially hard when one uses multiple RemoteData folders as inputs.

I would propose to copy/symlink the folders as is, giving them the name as the input node link.

In practice, it means that a call like the following:

    remote_folder = RemoteData(...)
    results, node = launch_shell_job(
        'cubehandler',
        nodes={'previous_calc': remote_folder},
        ...
        )

would generate a folder previous_calc that is a copy/symlink of the remote_folder.

It is a breaking change, but since the project is still in a pre-release phase (0.x version), I assume it is acceptable.

yakutovicha commented 1 week ago

In practice, that would require to change the following lines:

https://github.com/sphuber/aiida-shell/blob/14866d1450aa252ec414373e86e98611e2eae9db/src/aiida_shell/calculations/shell.py#L340-L341

To something like that:

        remote_nodes = [(name, node) for (name, node) in inputs.get('nodes', {}).items() if isinstance(node, RemoteData)]
        instructions = [(computer_uuid, f'{node.get_remote_path()}', name) for (name, node) in remote_nodes]

Happy to make a PR :)

sphuber commented 1 week ago

I see the problem and it would be great to support it, but I don't think the proposed solution is the way to go. Most simple use cases rely on the contents of a RemoteData to copied directly in the working directory. Changing this would break that and force all these simple use cases to do more work to have the command read from a subdirectory, if this is even possible.

Instead, I think we should keep the default behavior and just make it possible for the user to specify the target directory for RemoteData inputs. This feature already exists: https://aiida-shell.readthedocs.io/en/latest/howto.html#running-a-shell-command-with-folders-as-arguments

With the filenames argument you can specify the base directory for any file data provided by input nodes. I don't see why this couldn't or shouldn't also apply to RemoteData nodes. I quickly looked at the code and the ShellCalculation.handle_remote_data_nodes does not consider filenames whereas it probably should. There also seems to be a bug: https://github.com/sphuber/aiida-shell/blob/14866d1450aa252ec414373e86e98611e2eae9db/src/aiida_shell/calculations/shell.py#L410

The method ShellCalculation.handle_remote_data doesn't exist and so should raise. I guess this line just never gets called currently.

yakutovicha commented 1 week ago

With the filenames argument you can specify the base directory for any file data provided by input nodes. I don't see why this couldn't or shouldn't also apply to RemoteData nodes.

This solution is also ok for me 👍