Closed jyaacoub closed 4 months ago
PDB parser takes a lot of time to parse 50 confirmations (see #84), however my script for parsing is already as optimal as it can be (for python code). Next step should be to write the parser in c or rust.
comparison of my parser vs prody.parsePDB
from prody import parsePDB import numpy as np from src.utils.residue import Chain, Ring3Runner import logging import time logging.getLogger().setLevel(logging.INFO) logging.getLogger('.prody').setLevel(logging.WARNING) start_time = time.time() # af_confs = '/cluster/home/t122995uhn/projects/data/pdbbind/alphaflow_io/out_pid_ln/1c5c.pdb' pid = '2zq0' af_confs = f'/cluster/home/t122995uhn/projects/data/pdbbind/alphaflow_io/out_pdb_MD-distilled/{pid}.pdb' # pdb_fp = '/cluster/projects/kumargroup/jean/data/pdbbind/v2020-other-PL/1c5c/1c5c_protein.pdb' pdb_fp = f'/cluster/projects/kumargroup/jean/data/pdbbind/v2020-other-PL/{pid}/{pid}_protein.pdb' target_seq = Chain(pdb_fp).sequence # Timing get_all_models get_all_models_start = time.time() chains = Chain.get_all_models(af_confs) logging.info(f"get_all_models: {time.time() - get_all_models_start} seconds") get_all_models_start = time.time() chains = [parsePDB(af_confs, subset='ca', chain='A', model=i) for i in range(50)] logging.info(f"get_all_models: {time.time() - get_all_models_start} seconds") print(len(target_seq))
PDB parser takes a lot of time to parse 50 confirmations (see #84), however my script for parsing is already as optimal as it can be (for python code). Next step should be to write the parser in c or rust.
comparison of my parser vs prody.parsePDB
CODE: