plinder-org / plinder

Protein Ligand INteraction Dataset and Evaluation Resource
https://plinder.sh
Apache License 2.0
140 stars 8 forks source link

Enriching PLIP Interaction info #60

Closed leelasd closed 1 week ago

leelasd commented 1 week ago

Hi @naefl @yusuf1759 !! Great work on PLINDER. I was wondering about your plans for making PLIP data accessible in the dataset. I noticed a ligand interaction column with information about the residue no, but no protein residue type (please correct me if I missed it). That extra information would be a great addition to an already cool dataset. It would help people search across PLINDER for interactions with specific residue types, which is much needed for the compound design to improve selectivity.

Best wishes, Leela

yusuf1759 commented 1 week ago

Hey @leelasd,

Thanks for the feedback.

In this iteration of PLINDER, our approach has been to expose only a minimal set of useful annotations via the annotation table to avoid overwhelming users.

That said, we allow more advanced users access to the source of the annotation table with the function plinder.core.index.utils.load_entries. This provides access to all the annotations we computed in the course of developing PLINDER.

For this specific use case, I would do:

from plinder.core.index.utils import load_entries

system_id = "8dat__1__1.A_1.B__1.L"
pdb_id = "8dat"
chain = "1.A"

# Load all annotations
entry = load_entries(pdb_ids=[pdb_id])

# Function to extract residue name
def get_residue_types( system_id: str, chain_instance) ->dict[tuple[str], list[str]]:
    pdb_id = system_id.split("__")[0]
    mapping = {}
    individual_interactions = entry[pdb_id ]["systems"][system_id]["ligands"]
    for interaction_dict in individual_interactions:
        for residue_number, list_of_interaction_hash in interaction_dict['interactions'][chain_instance].items():
            chain = chain_instance.split(".")[-1]
            residue_name =  entry[pdb_id]['chains'][chain]['residues'][residue_number]['name']
            mapping[(residue_number, residue_name)] = list_of_interaction_hash
    return mapping

print(get_residue_types( system_id, chain))

Output


{('220', 'GLY'): ['type:hydrogen_bonds__protisdon:False__sidechain:False',
  'type:hydrogen_bonds__protisdon:True__sidechain:False'],
 ('317', 'ASP'): ['type:hydrogen_bonds__protisdon:False__sidechain:True'],
 ('261', 'GLY'): ['type:hydrogen_bonds__protisdon:True__sidechain:False'],
 ('262', 'THR'): ['type:hydrogen_bonds__protisdon:True__sidechain:False'],
 ('263', 'GLY'): ['type:hydrogen_bonds__protisdon:True__sidechain:False'],
 ('264', 'LYS'): ['type:hydrogen_bonds__protisdon:True__sidechain:False',
  'type:salt_bridges__protispos:True',
  'type:salt_bridges__protispos:True'],
 ('265', 'THR'): ['type:hydrogen_bonds__protisdon:True__sidechain:False',
  'type:hydrogen_bonds__protisdon:True__sidechain:True'],
 ('266', 'LEU'): ['type:hydrogen_bonds__protisdon:True__sidechain:False']}

Let me know if this helps.

leelasd commented 5 days ago

Thank you for the help @yusuf1759! That makes sense.