sphuber / aiida-shell

AiiDA plugin that makes running shell commands easy.
MIT License
14 stars 7 forks source link

Got error when running example from aiida-shell doc #65

Closed superstar54 closed 8 months ago

superstar54 commented 8 months ago

Hi, I am running the example from aiida-shell doc.

#!/usr/bin/env runaiida
"""Simple ``aiida-shell`` script to manipulate a protein defined by a .pdb file.

Just requires a configured AiiDA profile and ``pdb-tools`` to be installed.
"""
from aiida_shell import launch_shell_job
from aiida import load_profile
load_profile()

results, node = launch_shell_job('pdb_fetch', '1brs')
results, node = launch_shell_job('pdb_selchain', '-A,D {pdb}', {'pdb': results['stdout']})
results, node = launch_shell_job('pdb_delhetatm', '{pdb}', {'pdb': results['stdout']})
results, node = launch_shell_job('pdb_tidy', '{pdb}', {'pdb': results['stdout']})

print(f'Final pdb: {node}')
print(f'Show the content using `verdi node repo cat {node.pk} pdb`')
print(f'Generate the provenance graph with `verdi node graph generate {node.pk}`')

I got this error.

$ verdi node repo cat 5383 pdb
Critical: failed to get the content of file `pdb`: object with path `pdb` does not exist.

If I cat the stdout file directly, it only show END.

$ verdi node repo cat 5387
END                                                                     
sphuber commented 8 months ago

I cannot reproduce this. Is pdb-tools installed in your environment? Can you show more details on the output of the ShellJob that fails? Please share the content of the _aiidasubmit.sh, stdout and stderr files. Also the _scheduler-stderr.txt if it contains anything

superstar54 commented 8 months ago

I figured out the reason. I am running in a conda environment, aiida. I install pdb-tools in this env too. The problem is that, the pdb-tools does not work in the base env, which the calcjob runs.

As show in the _aiidasubmit.sh

'/home/xing/miniconda3/envs/aiida/bin/pdb_fetch' '1brs'  > 'stdout' 2> 'stderr'

However, the code didn't report any error. It just produces an empty file in the stdout.

It would be good to add an option to activate the environment.

sphuber commented 8 months ago

Thanks for the additional details. That makes sense. aiida-shell does indeed assume that the binary is available in the "default" system environment.

It would be good to add an option to activate the environment.

This already exists because you can use the same metadata options as for any other CalcJob, so you could do

launch_shell_job(
    'pdb_fetch',
    metadata={
        'options': {
            'prepend_text': 'conda activate some-env',
        }
    }
)

I will add this to the documentation, because it is not immediately evident of course.

sphuber commented 8 months ago

Could you confirm whether that suggestion solves the issue for you @superstar54 ?

superstar54 commented 8 months ago

Thanks. It works after adding.

metadata={
                                    'options': {
                                        'prepend_text': 'conda activate aiida',
                                    }
                                }

Another problem, I have set the filenames explicitly:

results, node = launch_shell_job('pdb_selchain', '-A,D {pdb}', {'pdb': results['stdout']},
                                 filenames={'pdb': 'input.pdb'},
                                 metadata={
                                    'options': {
                                        'prepend_text': 'conda activate aiida',
                                    }
                                })

Otherwise, the result stdout is empty.

sphuber commented 8 months ago

Thanks. It works after adding.

Thanks for confirming, I will add it to the docs.

Another problem, I have set the filenames explicitly: Otherwise, the result stdout is empty.

Could you please share the content of _aiidasubmit.sh and an ls -l in the working directory of the failed calculation? I don't understand why should have to specify the input file name explicitly

superstar54 commented 8 months ago
$ cat _aiidasubmit.sh 
#!/bin/bash
exec > _scheduler-stdout.txt
exec 2> _scheduler-stderr.txt

conda activate aiida

'/home/xing/miniconda3/envs/aiida/bin/pdb_selchain' '-A,D' 'stdout'  > 'stdout' 2> 'stderr'

echo $? > status
$ ls
_aiidasubmit.sh  _scheduler-stderr.txt  _scheduler-stdout.txt  status  stderr  stdout

I noticed this error for all jobs, but it the env is activated successfully, because everything work when I set filenames explicitly:

$ cat _scheduler-stderr.txt 

CondaError: Run 'conda init' before 'conda activate'
superstar54 commented 8 months ago

I tested locally. I copy the output of pdb_fetch to stdout, and run

pdb_selchain -A,D stdout  > stdout

The stdout will be empty.

If I copy the output of pdb_fetch to another name input, and run

pdb_selchain -A,D input  > stdout

The stdout is correct.

It seems the input and output can not be the same name.

sphuber commented 8 months ago

Thanks for that. I see what the problem is now. By default, aiida-shell will copy a SinglefileData input node to the working directory with its filename attribute. In your case, this is stdout and so the input file is written to stdout, as shown in the command line that does pdb_selchain '-A,D' 'stdout' > 'stdout'. The problem is then that the input file is being overwritten by the output of the pdb_selchain command.

I will have to put a check in that makes sure there won't be identical files that are automatically written, because that would lead to overwriting. I just wonder if there are some use-cases where you actually want to overwrite the input file in this manner.