Closed rimmartin closed 1 year ago
CONECT records are not particularly useful and they don't have equivalent in mmCIF, so I planned to leave them out. What would you use it for?
Hi @wojdyr, work related. Treated similar to mol2 & sdf bond information. Moe for example writes out CONECT's and then read in to try to insure the connectivity is identical.
Pymol will pick up the redundancy and show bond order by number of bars between to atoms.
There is at least one more company that's adding CONECT records to pdb files written by gemmi, so I see that it'd be good to have it implemented. Although it's not clear to me what's the best way to store these records in gemmi. Where the information for CONECT comes from in your work?
Added a Conect type to Connection::Type enumeration and bond order property to Connection https://github.com/quantumbio/gemmi/blob/master/include/gemmi/metadata.hpp#L259 https://github.com/quantumbio/gemmi/blob/master/include/gemmi/metadata.hpp#L266
Added a conect_records=false to PdbWriteOptions so that currently by default CONECT's are not written https://github.com/quantumbio/gemmi/blob/master/include/gemmi/to_pdb.hpp#L23C8-L23C22
As the serial is incremented in to_pdb.cpp https://github.com/quantumbio/gemmi/blob/master/src/to_pdb.cpp#L265 I record it back to the atom's property at https://github.com/quantumbio/gemmi/blob/master/src/to_pdb.cpp#L289
With these changes the CONECT info can be stored by the calling app's code
pdbWriteOptions.conect_records=true;
...
residues->setIndex(atomsA->getResidueIndex());
if(residues->getName().compare("HOH")==0)continue;
connection.partner1.atom_name=atomsA->getName();
connection.partner1.chain_name=residues->getStrandName();
connection.partner1.res_id.name=residues->getName();
if(!atomsA->getAlternateLocation().empty())connection.partner1.altloc=atomsA->getAlternateLocation().at(0);
char ic=(!residues->getInsertionCode().empty()) ? residues->getInsertionCode().at(0): ' ';
connection.partner1.res_id.seqid=gemmi::SeqId(residues->getSequence(), ic);
residues->setIndex(atomsB->getResidueIndex());
if(residues->getName().compare("HOH")==0)continue;
connection.partner2.atom_name=atomsB->getName();
connection.partner2.chain_name=residues->getStrandName();
connection.partner2.res_id.name=residues->getName();
if(!atomsB->getAlternateLocation().empty())connection.partner2.altloc=atomsB->getAlternateLocation().at(0);
ic=(!residues->getInsertionCode().empty()) ? residues->getInsertionCode().at(0): ' ';
connection.partner2.res_id.seqid=gemmi::SeqId(residues->getSequence(), ic);
if(atomsA->getAtomicNumber()==16 && atomsB->getAtomicNumber()==16)
{
connection.type=gemmi::Connection::Disulf;
structure.connections.push_back(connection);
}
else if(atomsA->isMetal() || atomsB->isMetal()){
connection.type=gemmi::Connection::MetalC;
structure.connections.push_back(connection);
}
else if(atomsA->getResidueIndex()!= atomsB->getResidueIndex()){
connection.type=gemmi::Connection::Covale;
structure.connections.push_back(connection);
}
else{
connection.type=gemmi::Connection::Conect;
connection.order=bonds->getOrder();
structure.connections.push_back(connection);
}
for appropriate bonds(the else case after the other enumeration types are decided) the new Connection::Type::Conect is set.
Then in to_pdb.cpp at https://github.com/quantumbio/gemmi/blob/master/src/to_pdb.cpp#L344
if (opt.conect_records) {
std::unordered_map<int, std::vector<int>> conectmap;
for (const Connection& con : st.connections)
if (con.type == Connection::Conect) {
const_CRA cra1 = st.models[0].find_cra(con.partner1, true);
const_CRA cra2 = st.models[0].find_cra(con.partner2, true);
// In special cases (LINKR gap) atoms are not there.
if (!cra1.residue || !cra2.residue)
continue;
conectmap[cra1.atom->serial].push_back(cra2.atom->serial);
conectmap[cra2.atom->serial].push_back(cra1.atom->serial);
if(con.order>=2){
conectmap[cra1.atom->serial].push_back(cra2.atom->serial);
conectmap[cra2.atom->serial].push_back(cra1.atom->serial);
}
if(con.order>=3){
conectmap[cra1.atom->serial].push_back(cra2.atom->serial);
conectmap[cra2.atom->serial].push_back(cra1.atom->serial);
}
}
for(auto ele : conectmap){
snprintf_z(buf, 82, "CONECT%5s",
encode_serial_in_hybrid36(ele.first).data());
int columnOffset = 11;
for(int c : ele.second){
snprintf_z(buf+columnOffset, 82, "%5s",
encode_serial_in_hybrid36(c).data());
columnOffset+=5;
}
buf[columnOffset] = '\n';
os.write(buf, columnOffset+1);
}
}
is the CONECT output where the serial numbers are gathered for order redundancy and for both directions.
For example N(serial=1), HN2(serial=2)& HN3(serial=3)
CONECT 1 2 3
CONECT 2 1
CONECT 3 1
occur.
If a conect is a double bond order for example C1(serial=1) & C2(serial=2)
CONECT 1 2 2
CONECT 2 1 1
Other software reading this pdb may pick this up and know the order for example pymol displays the double bond.
The calling app's code decides on what CONECT's are going to be defined; skip aa and dna standards and HOH and fill in nonstandard bonds.
Would be helpful if someone from the other company can join this discussion with you. We are running successfully with this implementation but can adjust to any changes you might decide
I've been thinking about different ways of storing data for CONECT:
Similarly to how it's stored in mmCIF (PDB NextGen archive contains category chem_comp_bond which was added because people were missing the information from CONECT). This would mean that gemmi would determine CONECTs from _chem_comp_bond and _struct_conn. But since in computational chemistry, unlike in wwPDB, CONECT provides also bond order information, and kekulization is out of the scope of gemmi, this wouldn't be particularly useful.
As you proposed in the comment above and in #274, i.e. using AtomAddress.
Using serial numbers only. The calling app would need to do something similar to:
gemmi::assign_serial_numbers(structure, /*numbered_ter=*/true);
for (...)
structure.add_conect(serial1, serial2, bond_order);
PdbWriteOptions options;
options.preserve_serial = true;
options.conect_records = true;
structure.write_pdb(path, options);
Would 3 work for you?
And concerning leaving standard amino's, dna's without explicit CONECT's? 3. would be for only nonstandard connections/bonds? When an app creates all connections/bonds they can be time consuming to parse and get them all related back to the atoms again; as systems go toward 150,000 atoms and above
And I agree with keeping it readily compatible with mm cifs. We are applying those thru gemmi more and more
I tentatively implemented 3. CONECT records will contain whatever you add with add_conect(). It's symmetric, so it adds A-B and B-A. There is also add_conect_one_way(), but it's probably not needed. Let me know if it's not a particularly convenient solution.
Had to think about it from different perspectives.
My way comes from the situation that our backbone(our gemmi::Structure) existed and grew since 2007. And I'm bridging over to gemmi::Structure.
As I rethought about it what you are implementing is correct for gemmi and new users which is the better way to go. An app created since gemmi's existence can just get going directly using gemmi::Structure. This might be the case for the other company you mentioned depending when they started coding.
If my company ever gives me the time I plan to use gemmi::Structure inside our backbone classes:-) And extend(derive) the struct's and vectors for our properties designed for all the chemistry and algorithms we've implemented. It is actually quite doable because backbones are backbones and relating all the atoms, residues and chains follows.
I'll reclone your branch and begin testing(bridging CONECT's between our backbone and gemmi::Structure on our side)
Thank you
I've sync'ed with the latest gemmi and switched my app code to use your impl for CONECT's. Running successfully! Thank you
Good to know that it worked.
To make it complete, I added reading CONECT records.
Then, since I didn't want to copy CONECTs by default when PDB is read and written, I changed the default value of option PdbWriteOptions::conect_records
to false. So if you have in your code:
options.preserve_serial = true;
you'll need to add also:
options.conect_records = true;
Any interest in CONECT's? They are serial based. Maybe a starting point is saving the serial numbers back on the Atom struct?
HETATM's (water excluded) and what else get these explicit connections?
https://github.com/project-gemmi/gemmi/blob/master/include/gemmi/pdb.hpp#L444
Some writers repeat CONECT serial pairs a bond-order number of times to represent the order