Closed tjduigna closed 2 months ago
Click to see where and how coverage changed
File Statements Missing Coverage Coverage
(new stmts)Lines missing
src/plinder/core/index
utils.py
181, 232, 240, 255, 269, 271
src/plinder/core/loader
loader.py
11, 53-55, 78-84, 95-96
src/plinder/core/split
utils.py
src/plinder/core/system
system.py
98, 242-247
utils.py
24
src/plinder/core/utils
config.py
cpl.py
74, 101, 155, 170-171
unpack.py
131-133, 143
src/plinder/eval/docking
utils.py
write_scores.py
Project Total
This report was generated by python-coverage-comment-action
Some timing results:
plinder_download
from scratch on a VM including full output:
[tjd plinder]$ plinder_download --yes
plinder.core.PlinderDataset requires pytorch and atom3d.
please run:
pip install plinder[loader]
to enable the data loader
2024-08-30 18:41:03,267 | plinder.core.index.utils:190 | INFO : Syncing gs://plinder/2024-06/v2 -> ~/.local/share/plinder/2024-06/v2. If this is the first time you are running this command, it will take a while!
The estimated time on the progress bar may vary wildly based on varied file sizes. If you need to cancel this and come back to it, it will pick up where it left off.
2024-08-30 18:41:04,090 | plinder.core.index.utils:228 | INFO : Syncing clusters 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 665/665 [00:16<00:00, 40.17it/s] 2024-08-30 18:41:20,851 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 16.63s 2024-08-30 18:41:20,886 | plinder.core.index.utils:228 | INFO : Syncing entries 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1060/1060 [00:22<00:00, 47.36it/s] 2024-08-30 18:41:43,602 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 22.53s 2024-08-30 18:41:43,635 | plinder.core.index.utils:228 | INFO : Syncing fingerprints 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 4.24it/s] 2024-08-30 18:41:44,425 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.71s 2024-08-30 18:41:44,472 | plinder.core.index.utils:228 | INFO : Syncing index 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.53s/it] 2024-08-30 18:41:51,597 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 7.06s 2024-08-30 18:41:51,646 | plinder.core.index.utils:228 | INFO : Syncing ligand_scores 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 513/513 [00:09<00:00, 56.18it/s] 2024-08-30 18:42:01,070 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 9.32s 2024-08-30 18:42:01,109 | plinder.core.index.utils:228 | INFO : Syncing ligands 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 2191/2191 [00:18<00:00, 121.04it/s] 2024-08-30 18:42:19,751 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 18.30s 2024-08-30 18:42:19,789 | plinder.core.index.utils:228 | INFO : Syncing links 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3.87it/s] 2024-08-30 18:42:20,360 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.52s 2024-08-30 18:42:20,393 | plinder.core.index.utils:228 | INFO : Syncing linked_structures, this may take a while! 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1060/1060 [06:07<00:00, 2.88it/s] 2024-08-30 18:48:28,266 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 367.68s 2024-08-30 18:48:28,307 | plinder.core.index.utils:236 | INFO : extracting linked_structures archives, you may want to stretch your legs. 2024-08-30 18:48:30,552 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.00s 2024-08-30 18:48:30,552 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.00s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1060/1060 [06:13<00:00, 2.84it/s] 2024-08-30 18:54:44,417 | plinder.core.index.utils:228 | INFO : Syncing mmp 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.21it/s] 2024-08-30 18:54:45,417 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.91s 2024-08-30 18:54:45,455 | plinder.core.index.utils:209 | INFO : Syncing scores/search_db=apo 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.53s/it] 2024-08-30 18:54:48,052 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 2.53s 2024-08-30 18:54:48,100 | plinder.core.index.utils:209 | INFO : Syncing scores/search_db=pred 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.23it/s] 2024-08-30 18:54:48,612 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.45s 2024-08-30 18:54:48,670 | plinder.core.index.utils:209 | INFO : Syncing scores/search_db=holo, this may take a while! 2024-08-30 18:54:48,670 | plinder.core.index.utils:211 | INFO : the tqdm progress bar for holo is not very useful, please be patient! 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 36/36 [09:34<00:00, 15.95s/it] 2024-08-30 19:04:23,129 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 574.38s 2024-08-30 19:04:23,174 | plinder.core.index.utils:228 | INFO : Syncing splits 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.48it/s] 2024-08-30 19:04:23,610 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.37s 2024-08-30 19:04:23,659 | plinder.core.index.utils:228 | INFO : Syncing systems, this may take a while! 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1060/1060 [19:05<00:00, 1.08s/it] 2024-08-30 19:23:29,434 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 1145.58s 2024-08-30 19:23:29,482 | plinder.core.index.utils:236 | INFO : extracting systems archives, you may want to stretch your legs. 2024-08-30 19:23:41,984 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.00s 2024-08-30 19:23:41,984 | plinder.core.utils.cpl.download_paths:24 | INFO : runtime succeeded: 0.00s 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1060/1060 [23:46<00:00, 1.35s/it] 2024-08-30 19:23:42,001 | plinder.core.index.utils:242 | INFO : Sync complete in 36.12m!
If you downloaded all of the data, you can run:
export PLINDER_OFFLINE=true
This will avoid checking that files are still in sync when using plinder.core. If you didn't download all of the data, plinder.core will download it lazily when it's needed. By default, plinder.core will check that files are still in sync in case any of the files for an existing release need to be patched.
- Subsequent re-run after the dataset was downloaded:
```console
[tjd plinder]$ plinder_download --yes
...
Sync complete in 41.47s!
...
PLINDER_OFFLINE
set:
[tjd plinder]$ PLINDER_OFFLINE=true plinder_download --yes
...
Sync complete in 15.82s!
...
Timing results for PlinderDataset
with the following code snippet:
from time import time
from plinder.core import get_split, PlinderDataset
split = get_split()
dataset = PlinderDataset(df=split, load_alternative_structures=True)
for i in range(dataset._num_examples):
t0 = time()
dataset[i]
t1 = time()
print(f"time for index {i}: {t1 - t0:.2f}s")
main
branch
time for index 0: 3.66s
time for index 1: 3.79s
time for index 2: 3.83s
time for index 3: 3.84s
time for index 4: 3.79s
time for index 5: 3.74s
time for index 6: 3.85s
time for index 7: 3.91s
time for index 8: 3.93s
time for index 9: 3.82s
time for index 10: 3.98s
...
time for index 0: 0.40s time for index 1: 0.19s time for index 2: 0.18s time for index 3: 0.19s time for index 4: 0.21s time for index 5: 0.23s time for index 6: 0.19s time for index 7: 0.22s time for index 8: 0.19s time for index 9: 0.23s time for index 10: 0.23s ...
- on PR branch with `PLINDER_OFFLINE=true`
```console
time for index 0: 0.24s
time for index 1: 0.03s
time for index 2: 0.03s
time for index 3: 0.03s
time for index 4: 0.03s
time for index 5: 0.03s
time for index 6: 0.03s
time for index 7: 0.08s
time for index 8: 0.03s
time for index 9: 0.08s
time for index 10: 0.07s
...
Iterating over plinder systems identifies some inefficiencies which are not obvious when working in the low sample regime. This PR accomplishes the following:
unpack.get_zips_to_unpack
rather than inspecting the archive within thePlinderSystem
PlinderSystem
will assume extraction already occurred, but still evaluate lazily if necessaryget_plinder_path
callquery_links
for everyPlinderSystem
in thePlinderDataset
load_alternative_structures
is set, pre-load all apo/pred links and group them by system IDplinder_download
to make it a viable alternative togsutil -m cp -r && cd && for i in ...; do unzip $i; done && ...
With all of these changes, iterating over
PlinderSystems
and using thePlinderDatset
becomes reasonable in terms of runtime performance. Things can be further expedited with the usage of thePLINDER_OFFLINE=true
environment variable.