jonathanking / sidechainnet

An all-atom protein structure dataset for machine learning.
BSD 3-Clause "New" or "Revised" License
322 stars 36 forks source link

Implement SCNDataset, SCNProtein, and HydrogenBuilder. #35

Closed jonathanking closed 2 years ago

jonathanking commented 2 years ago

Description

This pull request does three things:

  1. Implements SCNDataset for easier data access (data = scn.load(... scn_dataset=True)).
  2. Implements SCNProtein for easier data manipulation (access attributes, etc.).
  3. Implements HydrogenBuilder. To add hydrogens to any coordinate set for a protein, simply use SCNProtein.add_hydrogens(). Note: the coordinate representation for the all-atom structure with hydrogens (SCNProtein.hcoords) does not include terminal-specific atoms (H2, H3, OXT). These are included in generated visualization/PDB file, but not in the coordinate set itself.

Please see the respective functions for more details.