haddocking / pdb-tools

A dependency-free cross-platform swiss army knife for PDB files.
https://haddocking.github.io/pdb-tools/
Apache License 2.0
378 stars 113 forks source link

Doesn't work if a chain ID is lowercase when you remove HETATM #39

Closed 2003100127 closed 4 years ago

2003100127 commented 4 years ago

Hello, thanks for your tools to the community. However, it doesn't really work if a chain ID is written in lowercase when you remove HETATM, like 'a', not 'A'. However, this pdb-tools will try to find the letter 'A'., and delete everything about 'a' and finally return you an empty file.

JoaoRodrigues commented 4 years ago

Thanks for the report, we'll look into it!

Could you give a little more details on which of the tools give you this error?

2003100127 commented 4 years ago

Interestingly, it does not though give any error. That is the problem. But when you check it. For example, pdb entry 5v2c has A,B,C... and a,b,c... chains. if you use pdb_selchain -M 5v2c.pdb | pdb_delhetatm | pdb_tidy > 5v2c_M.pdb, it will return correct results without any hetatm. However, if you use pdb_selchain -m 5v2c.pdb | pdb_delhetatm | pdb_tidy > 5v2c_m.pdb, it will return END and delete all related to residue m because it will take m as M. M is not found. Of course it will delete all residue m

mtrellet commented 4 years ago

I'm not completely sure I understand your last comment. According to me, the two commands should give exactly the same output provided that's the input is the same. But indeed the second one differs from what you would expect.

The "faulty" line in pdb_selchain.py is most likely this one:

https://github.com/haddocking/pdb-tools/blob/eb1baf7bd69e3259a4b2453c6bd88a6fc22a62d6/pdbtools/pdb_selchain.py#L97

Where all chain IDs are converted to be upper case. I'd change only the conversion to take lowercase chain IDs into account. @JoaoRodrigues sounds fair to you?

2003100127 commented 4 years ago

@mtrellet you are absolutely right. Sorry. I should explain why I wrote my last comments. Because I tested two separate input PDB files. For example, one input file is 5v2c_M.pdb and the other one is 5v2c_m.pdb file. That's why I get END by using second command in my last comment. If you used 5v2c as input twice, you will get same result with only M. @JoaoRodrigues

joaomcteixeira commented 4 years ago

I can add that PDB chains range from 0-1a-zA-Z and even multiple chars in mmCIF formats (which does not apply in this case). take all that into account.

mtrellet commented 4 years ago

@joaomcteixeira Sounds good to me! Any alphanumerical character is OK according to the PDB.

JoaoRodrigues commented 4 years ago

Sounds like a fix to the problem, any of you wants to take a shot at it?

joaomcteixeira commented 4 years ago

Closed in #40