Closed NiklasTR closed 1 year ago
Can't reproduce the error locally, when building from source. Will try with install script next.
Thank you - running it again here, too
Ran it again and could not reproduce the error. Will put this on hold for now and try to reproduce it tomorrow
worked for me too on the binary version. any chance your input files are empty in the place where it is not working?
I've noticed that curl will create the file but put a 404 error inside the file when it is not found.
Currently seeing the issue again:
note that this job includes 2 proteins - so make sure to run it with the same inputs
(base) rindtorff@niklas plex % ./plex -app diffdock -input-dir testdata/binding/abl -gpu=true -network=true
BACALHAU_API_HOST not set, using default host
## User input ##
Provided application name: diffdock
Provided directory path: testdata/binding/abl
Using GPU: true
Using Network: true
## Default parameters ##
Using app configs: config/app.jsonl
Setting layers to: 2
## Validating ##
App found: diffdock
## Searching input files ##
Found 3 matching files
testdata/binding/abl/7n9g.pdb
testdata/binding/abl/ZINC000003986735.sdf
testdata/binding/abl/ZINC000019632618.sdf
Created job directory /Users/rindtorff/plex/879801a8-08b6-4927-96dc-3f8f5702129c
added QmWmSf3hu78iVaWmDt1EVMGzxMfD6uPq9iPTbca7NVz4T6## Creating Bacalhau Job ##
Bacalhau Job Id: 60055584-eeaf-4d41-8124-df8874038174
Job running...
Your job results have been downloaded to /Users/rindtorff/plex/879801a8-08b6-4927-96dc-3f8f5702129c
(base) rindtorff@niklas plex % ls /Users/rindtorff/plex/879801a8-08b6-4927-96dc-3f8f5702129c/
(base) rindtorff@niklas plex % bacalhau describe 60055584-eeaf-4d41-8124-df8874038174
Job not found. ID: 60055584-eeaf-4d41-8124-df8874038174
(base) rindtorff@niklas plex % ls /Users/rindtorff/plex/879801a8-08b6-4927-96dc-3f8f5702129c/
7n9g.pdb ZINC000003986735.sdf ZINC000019632618.sdf index.csv index.jsonl
Job description:
HAPPENING | confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now.
Reading molecules and generating local structures with RDKit (unless --keep_local_structures is turned on).
Reading language model embeddings.
Generating graphs for ligands and proteins
loading data from memory: data/cache_torsion_allatoms/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_atomRad5_atomMax8_esmEmbeddings63314871/heterographs.pkl
Number of complexes: 2
radius protein: mean 49.7668571472168, std 0.0, max 49.7668571472168
radius molecule: mean 9.761337280273438, std 0.4370088577270508, max 10.198346138000488
distance protein-mol: mean 40.52650833129883, std 0.1205596923828125, max 40.64706802368164
rmsd matching: mean 0.0, std 0.0, max 0
common t schedule [1. 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35
0.3 0.25 0.2 0.15 0.1 0.05]
Size of test dataset: 2
Failed for 0 complexes
Skipped 0 complexes
Results are in ../outputs
stdouttruncated: false
ShardIndex: 0
State: Completed
UpdateTime: "2023-03-09T10:28:10.901179061Z"
VerificationResult:
Complete: true
Result: true
Version: 6
JobID: 60055584-eeaf-4d41-8124-df8874038174
ShardIndex: 0
State: Completed
UpdateTime: "2023-03-09T10:28:11.368770561Z"
Version: 2
State: Completed
TimeoutAt: "0001-01-01T00:00:00Z"
UpdateTime: "2023-03-09T10:28:11.368773261Z"
Version: 2
And downloaded results
(base) rindtorff@niklas plex % bacalhau get 60055584-eeaf-4d41-8124-df8874038174
Fetching results of job '60055584-eeaf-4d41-8124-df8874038174'...
Computing default go-libp2p Resource Manager limits based on:
- 'Swarm.ResourceMgr.MaxMemory': "8.6 GB"
- 'Swarm.ResourceMgr.MaxFileDescriptors': 30720
Applying any user-supplied overrides on top.
Run 'ipfs swarm limit all' to see the resulting limits.
Results for job '60055584-eeaf-4d41-8124-df8874038174' have been written to...
/Users/rindtorff/plex/job-60055584
(base) rindtorff@niklas plex % ls /Users/rindtorff/plex/job-60055584/combined_results
outputs stderr stdout
(base) rindtorff@niklas plex % ls /Users/rindtorff/plex/job-60055584/combined_results/outputs
complex_names.npy index1_..-inputs-7n9g.pdb____..-inputs-ZINC000019632618.sdf
confidences.npy min_self_distances.npy
esm2_output prepared_for_esm.fasta
index0_..-inputs-7n9g.pdb____..-inputs-ZINC000003986735.sdf run_times.npy
(base) rindtorff@niklas plex %
Discovered another issue while debugging
Running another test on a Ubuntu instance (Jupyter Lab) with PLEX installed from source.
This time I am looping through a set of requests. I am seeing about 30+% failure rate when it comes to downloading the results.
ubuntu@ip-172-31-90-44:~/plex$ for dir in 6o9b 4ayt 5jh6 1p2a 3e73 4fz6 5kr2 4oz3 4ucd 2hz0 1dkd 3lxg; do echo "$dir,$(./plex -app equibind -input-dir "/home/ubuntu/PDBBind_processed/$dir" -gpu=false -network=false | grep "Your job results have been downloaded to" | awk '{print $NF}')"; done > job_results.csv
2023/03/09 10:41:50 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:43:04 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:43:17 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:44:25 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:44:36 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:45:45 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:45:56 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:46:14 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:47:22 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:48:31 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:48:39 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
2023/03/09 10:49:48 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Receive-Buffer-Size for details.
ubuntu@ip-172-31-90-44:~/plex$
ubuntu@ip-172-31-90-44:~/plex$ less job_results.csv
ubuntu@ip-172-31-90-44:~/plex$ while IFS=',' read -r dir job_dir; do [[ -n $(find "$job_dir/combined_results/outputs" -name "*.sdf" -print -quit) ]] && echo "$dir,TRUE" || echo "$dir,FALSE"; done < job_results.csv
6o9b,TRUE
4ayt,TRUE
5jh6,TRUE
1p2a,FALSE
3e73,TRUE
4fz6,TRUE
5kr2,TRUE
4oz3,FALSE
4ucd,TRUE
2hz0,TRUE
1dkd,FALSE
3lxg,TRUE
ubuntu@ip-172-31-90-44:~/plex$
(This is job results for reference)
6o9b,/home/ubuntu/plex/011418a4-84d6-499e-8db0-36365c3bf69e
4ayt,/home/ubuntu/plex/d15d12b8-729a-4b00-8fa5-87771c7d5a8d
5jh6,/home/ubuntu/plex/528d12a8-2b49-4c04-990c-cd761de7f60e
1p2a,/home/ubuntu/plex/26001ccc-1a9f-450d-bd2f-0df93de010aa
3e73,/home/ubuntu/plex/07687f73-c5ce-4745-8967-4f6abe9c3896
4fz6,/home/ubuntu/plex/4207fbfc-b11d-457f-b167-601ee3784293
5kr2,/home/ubuntu/plex/d7c1581f-862a-45e2-b515-09b32235d32d
4oz3,/home/ubuntu/plex/021b3fc0-5971-40ef-82e3-3cecf63511f5
4ucd,/home/ubuntu/plex/30628d20-2042-434f-a25b-48092743e493
2hz0,/home/ubuntu/plex/008f40f0-fc61-4305-a17f-7820ca7560a5
1dkd,/home/ubuntu/plex/70a99d67-70f3-4b1f-b522-6d6b3ca5f5af
3lxg,/home/ubuntu/plex/01885263-8b9e-41e3-a582-f9592f83d9fe
I am now checking wether the data can be downloaded via bacalhau
I reran the same command another time on the same machine. I am getting the same pattern of missing files.
at this point it seems like the issue is not related to dropped downloads, but errors within the ligands and equibind. Digging deeper shows that the empty directories do not have any successful runs and thus an empty output directory. The prime reason is that equibind expects all files to end with .sdf and does not currently read .mol2 in our current configuration.
We will need to ship a change to the equibind container or a QC checker for sdf files in order to run equibind more reliably. For the demo, we will drop the 3 complexes with dysfunctional sdf files from the analysis and continue working with 9 complexes.
currently running diffdock with the following loop:
for dir in 6o9b 4ayt 5jh6 3e73 4fz6 5kr2 4ucd 2hz0 3lxg; do echo "$dir,$(./plex -app diffdock -input-dir "/home/ubuntu/PDBBind_processed/$dir" -gpu=true -network=true | awk '/Created job directory/ {gsub(/\/$/, "", $NF); printf("%s,",$NF)} /Bacalhau Job Id/ {print $NF}')"; done > "job_results_$now.csv"
Results so far:
No job directory has output data, while all manual bacalhau pulls have data.
Example below:
ubuntu@ip-172-31-90-44:~/plex$ ls /home/ubuntu/plex/07d532d6-3856-4679-a5ea-20ce5ff3e98b/
4ayt_ligand.mol2 4ayt_protein_processed.pdb index.jsonl
4ayt_ligand.sdf index.csv
ubuntu@ip-172-31-90-44:~/plex$ ls /home/ubuntu/plex/job-21f5fc6c/combined_results/outputs/
complex_names.npy index0_..-inputs-4ayt_protein_processed.pdb____..-inputs-4ayt_ligand.sdf run_times.npy
confidences.npy min_self_distances.npy
esm2_output prepared_for_esm.fasta
testdata/binding/abl
I checked the presence of the directory and validated the content. This should not be the issue from my perspective
waiting for #115 for more efficient debugging
Closing this as now also Equibind is handling mol2 files
Running diffdock on 0.3.0 as instructed.
I have seen this when building from source on Ubuntu and when installing binaries on MacOSX
manually pulling the bacalhau result works as expected after setting the HOST