kiharalab / DiffModeler

DiffModeler: a diffusion model based protein complex structure modeling tool.
https://em.kiharalab.org/algorithm/DiffModeler
24 stars 3 forks source link

EBI sequence mode gives error #8

Closed arupmondal835 closed 1 month ago

arupmondal835 commented 1 month ago

Hello,

I have been trying to use DiffModeler to model a protein complex from the sequences and I am using mode=1 as follows:

python /projects/gt47/amondal2/Source/DiffModeler/main.py --mode=1 -F=/projects/cbi/amondal2/cbi_structure/Fut1XXT2_experimental_data/cryosparc_P4_J145_011_volume_map.mrc -P=/projects/cbi/amondal2/cbi_structure/diffmodeler/11/seq_ebi/fut1xxt2.fasta --config=/projects/gt47/amondal2/Source/DiffModeler/config/diffmodeler.json --contour=0.0078 --gpu=0 --resolution=10.0 --output result_J145

It is running for a while, but then, unfortunately, I am getting an issue as follows: Creating result file: fasta-R20240711-210025-0492-67635832-p1m.out.txt Creating result file: fasta-R20240711-210025-0492-67635832-p1m.m9.txt Creating result file: fasta-R20240711-210025-0492-67635832-p1m.m10.txt Creating result file: fasta-R20240711-210025-0492-67635832-p1m.json.json Creating result file: fasta-R20240711-210025-0492-67635832-p1m.ids.txt Creating result file: fasta-R20240711-210025-0492-67635832-p1m.accs.txt Creating result file: fasta-R20240711-210025-0492-67635832-p1m.xml.xml Creating result file: fasta-R20240711-210025-0492-67635832-p1m.error.txt Creating result file: fasta-R20240711-210025-0492-67635832-p1m.sequence.txt Creating result file: fasta-R20240711-210025-0492-67635832-p1m.visual-svg.svg Creating result file: fasta-R20240711-210025-0492-67635832-p1m.visual-png.png Creating result file: fasta-R20240711-210025-0492-67635832-p1m.visual-jpg.jpg Creating result file: fasta-R20240711-210025-0492-67635832-p1m.ffdp-query-svg.svg Creating result file: fasta-R20240711-210025-0492-67635832-p1m.ffdp-query-png.png Creating result file: fasta-R20240711-210025-0492-67635832-p1m.ffdp-query-jpeg.jpg Creating result file: fasta-R20240711-210025-0492-67635832-p1m.ffdp-subject-svg.svg Creating result file: fasta-R20240711-210025-0492-67635832-p1m.ffdp-subject-png.png Creating result file: fasta-R20240711-210025-0492-67635832-p1m.ffdp-subject-jpeg.jpg Creating result file: fasta-R20240711-210025-0492-67635832-p1m.submission.params /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/11/seq_ebi/result_J145/single_chain_pdb/A created /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/11/seq_ebi/result_J145/single_chain_pdb/B created Traceback (most recent call last): File "/projects/gt47/amondal2/Source/DiffModeler/main.py", line 110, in fitting_dict = fasta2pool(params,save_path) File "/kfs2/projects/gt47/amondal2/Source/DiffModeler/ops/fasta2pool.py", line 76, in fasta2pool download_pdb(pdb_id,current_chain_dir,final_pdb_path) File "/kfs2/projects/gt47/amondal2/Source/DiffModeler/ops/fasta_searchdb.py", line 54, in download_pdb chain_id = pdbid.split("")[1] IndexError: list index out of range


There are two chains and the number of total residues is about 800. Could anybody help me with this? Thank you, Arup

arupmondal835 commented 1 month ago

Up on checking the output files, I think the error could come from here.

PDB:5KOR_B mol:protein length:521 Galactoside 2-a ( 521) 3304 825.9 0 PDB:5KOP_B mol:protein length:521 Galactoside 2-a ( 521) 3304 825.9 0 PDB:5KOR_D mol:protein length:521 Galactoside 2-a ( 521) 3304 825.9 0 AFDB:AF-Q9SWH5-F1 Galactoside 2-alpha-L-fucosyltra ( 558) 3304 825.9 0 AFDB:AF-W8PV36-F1 Fucosyltransferase UA=W8PV36 UI= ( 558) 3304 825.9 0 AFDB:AF-A0A654F2D1-F1 Fucosyltransferase UA=A0A654 ( 558) 3298 824.4 0 AFDB:AF-A0A178VUA3-F1 Fucosyltransferase UA=A0A178 ( 558) 3284 820.9 0 AFDB:AF-A0A5S9WWT0-F1 Fucosyltransferase UA=A0A5S9 ( 558) 3284 820.9 0 AFDB:AF-D7LQF4-F1 Fucosyltransferase UA=D7LQF4 UI= ( 561) 3057 764.7 0

The hits collected from the AF database do not have PDBID_CHAINID format. So that may cause the error.

wang3702 commented 1 month ago

No, I don't think that should be an error. I have the parsing for either RSCB or AF2 struture. Could you please share the fasta and map file to me? My email is wang3702@uw.edu. You can also try to upload your file here: https://em.kiharalab.org/algorithm/DiffModeler(seq) and let me know the job id. I can use that for debugging.

wang3702 commented 1 month ago

Please pull the latest code and I believe now it works. This failure is because EBI search results have been updated on their side. I would suggest to consider have a local database to run (This option in usage: Protein Structure Complex Modeling with sequence (Local Sequence Database)). You are also welcome to use the server https://em.kiharalab.org/algorithm/DiffModeler(seq), which is much simpler.

arupmondal835 commented 1 month ago

Thanks a lot, I will try with the new code.

Yes, the server is much easier, but it takes forever to complete a job. Starting from the template (mode 0), it took about an hour to run locally, but when I submitted the same job to the server, it has been running for the last two days. That's why I prefer to have a local version compiled.

wang3702 commented 1 month ago

Sorry for the long waiting on server! Recently we just hold a web seminar, so many new users come to try our tools. I think it will be pretty free after this week. Typically you can expect the job to be done in 1-2 days.

arupmondal835 commented 1 month ago

That's good to know. Thanks!

I used the new code, but still got a similar error:

Creating result file: fasta-R20240714-012209-0315-55211478-p1m.out.txt Creating result file: fasta-R20240714-012209-0315-55211478-p1m.m9.txt Creating result file: fasta-R20240714-012209-0315-55211478-p1m.m10.txt Creating result file: fasta-R20240714-012209-0315-55211478-p1m.json.json Creating result file: fasta-R20240714-012209-0315-55211478-p1m.ids.txt Creating result file: fasta-R20240714-012209-0315-55211478-p1m.accs.txt Creating result file: fasta-R20240714-012209-0315-55211478-p1m.xml.xml Creating result file: fasta-R20240714-012209-0315-55211478-p1m.error.txt Creating result file: fasta-R20240714-012209-0315-55211478-p1m.sequence.txt Creating result file: fasta-R20240714-012209-0315-55211478-p1m.visual-svg.svg Creating result file: fasta-R20240714-012209-0315-55211478-p1m.visual-png.png Creating result file: fasta-R20240714-012209-0315-55211478-p1m.visual-jpg.jpg Creating result file: fasta-R20240714-012209-0315-55211478-p1m.ffdp-query-svg.svg Creating result file: fasta-R20240714-012209-0315-55211478-p1m.ffdp-query-png.png Creating result file: fasta-R20240714-012209-0315-55211478-p1m.ffdp-query-jpeg.jpg Creating result file: fasta-R20240714-012209-0315-55211478-p1m.ffdp-subject-svg.svg Creating result file: fasta-R20240714-012209-0315-55211478-p1m.ffdp-subject-png.png Creating result file: fasta-R20240714-012209-0315-55211478-p1m.ffdp-subject-jpeg.jpg Creating result file: fasta-R20240714-012209-0315-55211478-p1m.submission.params /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/A-B created /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D created candidate PDB: 5KOE /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/A-B/0 created remain waiting assign chain number 61 candidate AFDB: AF-A0A178UVE4-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/0 created structure length 461 is not in the range of 372 and 455 candidate AFDB: AF-O22775-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/1 created structure length 461 is not in the range of 372 and 455 candidate AFDB: AF-A0A7G2EUF4-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/2 created structure length 495 is not in the range of 372 and 455 candidate AFDB: AF-D7M3H0-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/3 created structure length 463 is not in the range of 372 and 455 candidate AFDB: AF-R0FEJ2-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/4 created structure length 460 is not in the range of 372 and 455 candidate AFDB: AF-A0A1J3EHV3-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/5 created structure length 465 is not in the range of 372 and 455 candidate AFDB: AF-V4L2L6-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/6 created structure length 464 is not in the range of 372 and 455 candidate AFDB: AF-A0A565C020-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/7 created structure length 466 is not in the range of 372 and 455 candidate AFDB: AF-A0A6D2JF89-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/8 created structure length 465 is not in the range of 372 and 455 candidate AFDB: AF-A0A3P5Y3G6-F1 /kfs2/projects/cbi/amondal2/cbi_structure/diffmodeler/22/seq_ebi/result_J145/single_chain_pdb/C-D/9 created structure length 460 is not in the range of 372 and 455 Traceback (most recent call last): File "/projects/gt47/amondal2/Source/DiffModeler/main.py", line 110, in fitting_dict = fasta2pool(params,save_path) File "/kfs2/projects/gt47/amondal2/Source/DiffModeler/ops/fasta2pool.py", line 111, in fasta2pool download_pdb(pdb_candidate,current_chain_dir,final_pdb_path) File "/kfs2/projects/gt47/amondal2/Source/DiffModeler/ops/fasta_searchdb.py", line 54, in download_pdb chain_id = pdbid.split("")[1] IndexError: list index out of range

wang3702 commented 1 month ago

Should be fixed now. I have also tried your example and it works fine.

arupmondal835 commented 1 month ago

Yes, it is fixed. Thank you!