pinder-org / pinder

PINDER: The Protein INteraction Dataset and Evaluation Resource
https://pinder-org.github.io/pinder/
Apache License 2.0
90 stars 7 forks source link

APO and predicted structures will not load with PairedPDB from PinderSystems passed to function #10

Closed allcatsaregrey closed 2 months ago

allcatsaregrey commented 2 months ago

Hello,

Systems are loaded with

loader = PinderLoader(base_filters=base_filters, sub_filters=sub_filters)

The function calls

apo_data = PairedPDB.from_pinder_system( system=system, monomer1="apo_receptor", monomer2="apo_ligand", node_types=nodes, )

Which results in errors for every ID in the available metadata obtained with get_metadata()

apo_data = PairedPDB.from_pinder_system( File "/home/kimlab5/mmcfee/miniconda3/envs/pinder/lib/python3.10/site-packages/pinder/core/loader/geodata.py", line 84, in from_pinder_system chain1_struc.filter("element", mask=["H"], negate=True, copy=False) AttributeError: 'NoneType' object has no attribute 'filter'

Do you have any idea whats going on? Or do you have a better way to extract coordinates and atoms types for subsets of specific pinder systems and process them for saving?

Thanks!

danielkovtun commented 2 months ago

Hi @allcatsaregrey , the errors you are encountering are due to not all systems having an apo receptor and/or ligand.

The current behavior of PairedPDB.from_pinder_system is to expect that the requested monomer is available. Only "holo_receptor" and "holo_ligand" are available for every system in the index.

I will push an option to default fallback to holo_receptor/ligand if the requested monomer type is not available for the system. Let me know if that works for you. https://github.com/pinder-org/pinder/pull/11

In general, yes there are other ways of getting the coordinates and atom types for a subset of specific pinder systems. While there are a number of ways, the "simplest" would be via PinderSystem:

from tqdm import tqdm
from pinder.core import get_index, PinderSystem

index = get_index()
# For instance only those systems where both receptor and ligand have an apo pairing
ids = list(index.query('apo_R and apo_L').id)
# Or whatever other data structure you want to store the values in
system_atoms = {}
for i, pid in tqdm(ids):
    ps = PinderSystem(pid)
    apoR = ps.apo_receptor
    apoL = ps.apo_ligand
    system_atoms[pid] = {
        "coords": {"receptor": apoR.coords, "ligand": apoL.coords}, 
        "atom_names": {"receptor": apoR.atom_array.atom_name, "ligand": apoL.atom_array.atom_name}
    }

PinderSystem is just a top-level container for Structure objects associated with a system. For more details on the Structure object and its attributes, see: https://pinder-org.github.io/pinder/pinder-system.html https://github.com/pinder-org/pinder/blob/8ad1ead7a174736635c13fa7266d9ca54cf9f44e/src/pinder-core/pinder/core/loader/structure.py#L43-L498