macrocosm-os / folding

Decentralized Protein Folding Bittensor Subnet
https://www.macrocosmos.ai/sn25
MIT License
15 stars 17 forks source link

Do we need to remove HETATM and CONECT records from the `.pdb` file before creating a topology? #44

Closed schampoux closed 6 months ago

schampoux commented 7 months ago

This issue is to address the questions of removing HETATM and CONECT records from the .pdb file before creating a topology. This is an approach that was found in one of the tutorials, and it is noteworthy to understand what its doing to see if we need to apply it to the pipeline or not.

schampoux commented 7 months ago

HETATM: Records/lines in the .pdb file with this label denote atoms that are part of a heterogeneous group, meaning they are not part of the standard amino acid or nucleotide residues that make up proteins and nucleic acids, respectively. This can include:

CONECT: Records/lines containing this label specify the connectivity between atoms, indicating which atoms are bonded to which.

Both of these records are essential for a detailed and accurate representation of the molecular structure being described.

Should these records be removed in the pipeline? This implies all .pdb's will be stripped of these records.

Implications of Removal: Simplification = the system is simpler and might run more smoothly in terms of setup, expeciall with automated topology generation tools that may not handle non-standard residues or ligands well. For simulations where non-proteins components play a critical role (e.g., in binding studies or enzymatic reactions), removing these records can result in a loss of essential interactions and functional insights.

Conclusion: For the sake of development, it is safe for us to remove these records in the pipeline. link to the gpt conversation