haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
369 stars 112 forks source link

pdb_mkensemble includes the PDB filename when using a redirect #146

Closed amjjbonvin closed 1 year ago

amjjbonvin commented 1 year ago

When making an ensemble PDB from all PDF files in a given directly with the following command:

pdb_mkensemble *.pdb > test.pdb

the resulting ensemble does include test.pdb which was non-existent when the command was given and as a results an empty MODEL/ENDMDL is added to the file, which can give problems.

Here is the header of a created ensemble from two PDB files:

REMARK     MODEL 1 FROM hpr_1.pdb
REMARK     MODEL 2 FROM hpr_2.pdb
REMARK     MODEL 3 FROM test.pdb

and the end of the file contains:

...
ENDMDL
MODEL        3
ENDMDL
END
JoaoRodrigues commented 1 year ago

I'm not sure that is a "bug" on our end, and I'm not even sure of how to catch it. We don't know that we are writing to a file, we're writing to stdout and that's being redirected to a file. That's outside of Python's control and therefore pdb-tools. See: https://unix.stackexchange.com/questions/257110/exclude-target-redirection-file-from-being-processed-in-for-loop

I'm afraid that short of adding an output flag (which none of the other tools has), the only option is for the user to write the ensemble to a different folder, or be more specific with the glob: pdb_mkensemble model*.pdb > ensemble.pdb. Ultimately, we could add a check to only write files that have content, but that can hide a bunch of errors.

amjjbonvin commented 1 year ago

Ok - but then why does pdb_merge does not show the same behaviour then? Or may-be it does but since the file is non-existing (or empty) you don't see it in the resulting pdb file.

And I was not suggesting to add an output flag.

We leave it as is then, but this can give strange behaviours and lead to errors with other software using those ensembles.

joaomcteixeira commented 1 year ago

~When I first read this thread, I had the same opinion as João. But looking to your explanations and the code in more detail, it seems to be a bug. Give me some minutes, and I will come back with more.~

edited: see next comment

joaomcteixeira commented 1 year ago

Hi @amjjbonvin

I can't reproduce that behaviour. It works well in my case. For example:

pdb_fecth 2fej > 2fej.pdb
pdb_splitmodel 2fej.pdb
rm 2fej.pdb
pdb_mkensemble *.pdb > test.pdb

And the test.pdb is clean.

Check if you are running on the latest version:

$ pip show pdb-tools
Name: pdb-tools
Version: 2.5.0

In similar cases, I use a tmp file to bypass the output, for example:

pdb_mkensemble *.pdb > tmp && mv tmp test.pdb

Cheers,

amjjbonvin commented 1 year ago

Ok - must be a strange behaviour on my Mac then.

If running under tcsh (I know, it's dinosaur stuff...) I get the weird behaviour. When running under bash it works fine... In both cases the same Python3 ((3.7.2) from the system are called.

joaomcteixeira commented 1 year ago

Okay. So it's precisely as João said at first, it's a problem how the tcsh pass the arguments to the python sys.args. In bash the args are block to the > but likely in tcsh the whole command line is sent. If you see this can be a real issue because tcsh is used often in servers where pdb-tools may run, we can make a checkpoint in the python part. But that will bring a non-negligible burden to the code.