twopin / CAMP

predicting peptide-protein interactions
117 stars 30 forks source link

Error in STEP 1 #10

Closed lbwfff closed 2 years ago

lbwfff commented 2 years ago

Hi, Thank you for providing excellent tools. In the process of using the code to process data, I encountered some troubles. The following is my error:

python ./CAMP-master/data_prepare/step1_pdb_process.py 
Traceback (most recent call last):
  File "./CAMP-master/data_prepare/step1_pdb_process.py", line 40, in <module>
    PDB_chain_lst = [x.split('_')[1].split(' ')[0].lower() for x in raw_list]
IndexError: list index out of range

I am a python rookie, and I guess it is something that happened when processing the pdb_seqres_test.txt data, but the data is downloaded from ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt.gz, how can I solve this problem? Thanks, LeeLee

twopin commented 2 years ago

Hi Lee, The downloaded file should be formatted like this '>1b2m_A mol:protein length:104 RIBONUCLEASE T1 ACDYTCGSNCYSSSDVSTAQAAGYQLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT

1b2m_B mol:protein length:104 RIBONUCLEASE T1 ACDYTCGSNCYSSSDVSTAQAAGYQLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT ...' The codes above aim to extract the chain ID (e.g. 'A' and 'B' and save them as 'a' and 'b'). I don't know the detailed reason , maybe the file format of PDB website changes or the version of my codes is Python 2.7? Maybe you can adjust your codes according to your custom file format.