lukasturcani / stk

A Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design and the creation of molecular databases.
https://stk.readthedocs.io/
MIT License
251 stars 47 forks source link

Remove `pdb` from init_from_file options for BuildingBlock #446

Closed andrewtarzia closed 2 years ago

andrewtarzia commented 2 years ago

The minimum working example at the bottom shows clearly that if you write and then init_from_file, you will lose bond order information. This is because PDB files do not contain bond order information in their "CONECT" records.

Therefore, I suggest removing it as an init function (the with_structure_from_file method is completely fine because it just loads the coordinates). Perhaps a warning is added for 6 months before removing?

What do you think, @lukasturcani?


import stk
from collections import Counter

pdb_writer = stk.PdbWriter()

bb = stk.BuildingBlock(
    'CN(c1ccc(cc1)N=Cc2cccnc2)c3nc(nc(n3)N(C)c4ccc(cc4)N=Cc5ccccn5)'
    'N(C)c6ccc(cc6)N=Cc7ccccn7'
)

# Get bond order information.
orders = []
for bond in bb.get_bonds():
    orders.append(bond.get_order())

# Write to PDB.
bb.write(f'first.pdb')

# Read from PDB.
bb2 = stk.BuildingBlock.init_from_file('first.pdb')

# Get bond order information.
orders2 = []
for bond in bb2.get_bonds():
    orders2.append(bond.get_order())

print(Counter(orders), Counter(orders2)) ## >> Counter({1.0: 72, 2.0: 24}) Counter({1.0: 96})

# Write to PDB.
bb.write(f'second.pdb')

# Compare text of PDBs.
s1 = pdb_writer.to_string(bb)
s2 = pdb_writer.to_string(bb2)

print(s1 == s2)
lukasturcani commented 2 years ago

I dunno, I feel like both of these methods are behaving correctly. This feels like a case where I don't want the potential user-error to detract from convience of having these two methods around. Is there a way pdb can ecnode bond orders?

andrewtarzia commented 2 years ago

PDB cannot encode bond orders as far I can tell.

I disagree because convenience is lost when your chemistry is broken without a logical warning. The average user would not know this about PDB files (I did not until now). At least a warning would be useful in my opinion.

lukasturcani commented 2 years ago

OK, you've convinced me, partially. I am happy to not write PDB files but I think we should continue to read them. But the docs of the PDB reader should have a warning telling users that these files do not encode bond information.

lukasturcani commented 2 years ago

Actually, I take that back. Maybe it is for the best to remove them from reading too.

andrewtarzia commented 2 years ago

Not exactly the point - I am only concerned about the BuildingBlock.init_from_file method. All other methods use the Bond attributes in the stk molecule already, which are correct. It is the creation of a building block and its bonds from a pdb file that will be "unexpected". That said, the user could already know that.

A warning for PDB writers (PdbWriter() and Molecule.write()) is ok, but this is a user problem - I agree with you there. If they want to use PDBs, then they should know the problems.

That said, there are hacks around including bond order in PDB files, but you cannot know if they have been implemented or not.

lukasturcani commented 2 years ago

As a side note, why are people using these files in the first place?

andrewtarzia commented 2 years ago

I definitely do not agree with removing the writers because they are useful (PDB files, even without bond orders, are pretty universal in comp chem).

andrewtarzia commented 2 years ago

As a side note, why are people using these files in the first place?

Effectively all of computational biology is done in PDB format. Could be an old habit that won't die, but either way, it is universal.

lukasturcani commented 2 years ago

fair, we write XYZ files which do not preserve bond orders. so thats not too crazy

andrewtarzia commented 2 years ago

fair, we write XYZ files which do not preserve bond orders. so thats not too crazy

Do not even preserve bonds... I think the formats play their role outside of stk, which is why we write them and we read coordinates from them (with_structure_from_file is my best friend). But creating molecules from them is bad.

lukasturcani commented 2 years ago

https://github.com/lukasturcani/stk/pull/449

lukasturcani commented 2 years ago

Also, what is the easiest way to convert between chemical file formats in Python? I know OpenBabel exists but that's a CLI App. Im thinking something like

new_path = to_mol_file('path/to/pdb/file.pdb')

whihch will create a .mol file. that you can use in a temporary directory. you could optional specify its location

new_path = to_mol_file('path/to/pdb/file.pdb', 'path/to/mol/file.mol')

to make the conversion seamless from a programmers perspective

To be clear this would not be a part of stk but a small utility library. unless something like this already exists.

andrewtarzia commented 2 years ago

Open Babel also has a python API for this. Or the atomic simulation environment can do this. Many of these exist.

lukasturcani commented 2 years ago

oh yeah nice: https://openbabel.org/docs/dev/UseTheLibrary/Python_Pybel.html

andrewtarzia commented 2 years ago

Can this be closed now? #449