Closed leoisl closed 1 year ago
For the purpose of documentation, could you provide your hw configuration and the configuration that was used with mof-search (#cobs threads, etc.)? This might explain the discrepancy.
Also, do you know what time you spent in which phases (I usually run and measure make match
and make map
separately)?
For comparison, my values were following (previously):
1) matching using COBS
real 67m53.940s
user 116m44.494s
sys 49m49.453s
2) batch align with minimap & 8 cores
real 245m12.433s
user 1401m45.748s
sys 297m22.271s
Yeah, I will do this, the main message was just that things are working with no issues. I just managed to make COBS work completely on bioconda on linux and mac, including tests. Until yesterday, all tests were failing on mac, now everything is working. I just asked bioconda team to review and merge this, so soon we will have COBS on bioconda which will solve lots of issues.
Meanwhile I will work on:
make diagnostics
and get the stats you mentioned;That's really cool that COBS already works. Once its packaging is finished, I'll resume testing the computations on my computers.
I am also waiting for cobs v0.2.0 to be up in bioconda to resume testing
It seems it takes some 4 days for the package to be up, so I think you will be only able to test it after your vacations!
Interestingly, for Linux it's already there: https://anaconda.org/bioconda/cobs/files
Hey, sorry I am not home. Can you see the new version with conda search -c bioconda cobs?
On Tue, 19 Jul 2022, 21:22 Karel Břinda, @.***> wrote:
And for osx too: https://anaconda.org/bioconda/cobs/0.2.0/download/osx-64/cobs-0.2.0-ha276ee6_0.tar.bz2
— Reply to this email directly, view it on GitHub https://github.com/karel-brinda/mof-search/issues/52#issuecomment-1189515052, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACACJD274HXQRJPMHOOVQ23VU4FABANCNFSM537Y7QBA . You are receiving this because you authored the thread.Message ID: @.***>
No, it's not yet uploaded to Anaconda (https://conda.anaconda.org/bioconda/osx-64/) and I'm not sure how frequently this is done.
Using the code in PR https://github.com/karel-brinda/mof-search/pull/80 and these relevant configs:
max_decomp_jobs: 4
cobs_thr: 2
minimap_thr: 2
I got these benchmark values:
match (this is decompress+COBS match)
Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_decomp_jobs=4 max_download_jobs=8 --config benchmark=True -- match
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs
21142.03 1540.98 5918.97 35% 7411168 458151880 2067547352
map
Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_decomp_jobs=4 max_download_jobs=8 --config benchmark=True -- map
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs
16905.88 6919.36 68620.11 446% 208296 62321536 5568368
~5.9h to match and ~4.7h to map. I think the important thing to realise here is that, in both benchmarks, I think we could have at most 3 processes running (my machine has 8 cores, 1 is allocated to snakemake if I am not wrong, and the other 7 for the jobs, but each job here takes 2 threads). With 3 processes simultaneously, using at most 2 threads each, our percent_CPU
is capped at 600%. For the map
part, this look good, we got 446% CPU, which is somewhat close to this max. For the match
job though, this is abysmal. We just got 35% CPU, which is ~17 times less parallelism than the theoretical best. Note also the discrepancy we find between the real and sys/user time, and the huge amount of FS outputs for match (2B for match vs 5M for map, 400 times more for match) and FS inputs (458M for map vs 62M for map, 7 times more for match). I think all this points that the processes in the match part were a lot of time blocked waiting I/O.
There are some takeaways from this experiment:
The match part is I/O bound, while the map part seems to be CPU bound. Splitting the benchmark in these two parts makes the match part slow I think. I think we can get much more speed by running the whole pipeline, mixing minimap2 jobs with COBS jobs, so that we don't overload I/O;
This is the COBS match output sorted by date:
-r--r--r-- 1 leandro leandro 55K Jul 22 21:02 dustbin__10____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 55K Jul 22 21:02 dustbin__15____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 38K Jul 22 21:02 dustbin__21____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 47K Jul 22 21:02 dustbin__08____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 44K Jul 22 21:09 dustbin__12____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 38K Jul 22 21:09 dustbin__04____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 43K Jul 22 21:09 dustbin__06____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 44K Jul 22 21:09 dustbin__11____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 55K Jul 22 21:16 dustbin__09____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 42K Jul 22 21:16 dustbin__16____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 47K Jul 22 21:16 dustbin__05____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 43K Jul 22 21:16 dustbin__02____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 44K Jul 22 21:23 dustbin__13____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 61K Jul 22 21:23 dustbin__18____ARGannot_r3.nonamb.xz
-r--r--r-- 1 leandro leandro 40K Jul 22 21:23 dustbin__03____ARGannot_r3.nonamb.xz
For some reason, the pipeline starts by decompressing all dustbins first, which is the largest xz files we have. I think it would have been better if the order was random, so we would have one or two dustbins at a time, but IDK how snakemake chooses jobs and I've noticed that starting by dustbins seems to be systematic.
I will first try 1 to see how it goes
Here are my results. Given the character of data, I made the filtering part less strict.
ARGannot_r3.fa
thr: all
batches: "batches_full.txt"
max_decomp_jobs: 4
max_download_jobs: 20
cobs_thr: 1
cobs_kmer_thres: 0.70
minimap_preset: "sr"
minimap_thr: 1
minimap_extra_params: "--eqx"
keep_cobs_ind: False
download_dir: "."
benchmark: False
Note that several (~7) processes stalled and it's not
$ cat match.txt
Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_decomp_jobs=4 max_download_jobs=20 --config benchmark=True -- match
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs
4007.72 4082.04 7280.55 283% 2501552 0 25
1 from https://github.com/karel-brinda/mof-search/issues/52#issuecomment-1193107770 is not an option (see https://github.com/karel-brinda/mof-search/issues/89). Trying 2
Best performance so far for ARG annot on my machine
Code used was the one in PR https://github.com/karel-brinda/mof-search/pull/95
ARGannot_r3.fa
thr: all
batches: "batches_full.txt"
max_download_jobs: 8
cobs_thr: 1
cobs_kmer_thres: 0.33
minimap_preset: "sr"
minimap_thr: 1
minimap_extra_params: "--eqx"
max_decomp_MB: 2000
keep_cobs_ind: False
download_dir: "."
benchmark: False
Total time: 2.75h
Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_decomp_MB=2000 max_download_jobs=8 --config benchmark=True -- match
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs
9922.50 1174.58 6463.51 76% 7045748 458257680 2067547528
Total time: 3.35h:
(base) leandro@leandro-XPS-13-9310-2-in-1:~/git/mof-search$ cat logs/benchmarks/map.txt
Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_decomp_MB=2000 max_download_jobs=8 --config benchmark=True -- map
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs
12046.00 8646.27 82699.43 758% 208156 63913744 5574808
6.1h
At this point, all I can think is to use a Linux machine without disk encryption
This is a great improvement, congrats!
At this point, all I can think is to use a Linux machine without disk encryption
Yes, COBS ideally requires an SSD disk w/o encryption.
Btw. For the resistance genes, we can use cobs_kmer_thres: 0.7
, which should be substantially faster.
(This is a copy of https://github.com/karel-brinda/mof-search/issues/83#issuecomment-1199985952)
Version: * 7574ca6 - (HEAD) Dynamically reducing the number of data we write to the pipe (2 days ago) <Leandro Ishi>
Query: ARGannot_r3.fa
Finished successfully, no stalled processes now.
Configuration:
thr: all
batches: "batches_full.txt"
max_download_jobs: 8
cobs_thr: 1
cobs_kmer_thres: 0.70
minimap_preset: "sr"
minimap_thr: 1
minimap_extra_params: "--eqx"
max_decomp_MB: 2000
keep_cobs_ind: False
download_dir: "."
benchmark: False
Benchmarks:
==> map.txt <==
Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_decomp_MB=2000 max_download_jobs=8 --config benchmark=True -- map
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs
19857.93 21573.95 88627.35 554% 374220 0 0
==> match.txt <==
Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_decomp_MB=2000 max_download_jobs=8 --config benchmark=True -- match
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs
6460.46 3640.45 6579.11 158% 2517436 0 38
I got the machine detailed in https://github.com/karel-brinda/mof-search/issues/96#issuecomment-1196459483 but I need to configure it. Will be able to do it this Tuesday, then I can restart the benchmarking.
Cool! How did it go?
Couldn't run it before holidays, running now on ARG Annot
Resurrecting this issue. Benchmarking on recent code:
Current master (https://github.com/karel-brinda/mof-search/commit/8567c46d3ef8be07253b7fc20f8360b61feace5d)
all_ebi_plasmids.fa
batches: "data/batches_full.txt"
cobs_kmer_thres: 0.4
nb_best_hits: 100
minimap_preset: "asm20"
minimap_extra_params: "--eqx"
threads: all
max_ram_gb: 12
max_download_threads: 8
cobs_threads: 3
load_complete: True
max_io_heavy_threads: 6
minimap_threads: 1
prefer_pipe: True
keep_cobs_indexes: False
download_dir: "."
queries 2826
cumul_length_bps 223662732
matched_queries 442
aligned_queries 441
aligned_segments 375970
distinct_genome_query_pairs 26955
target_genomes 3748
target_batches 25
nonalignments 8
Total time: 7.25h
# Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_download_threads=8 max_io_heavy_threads=6 max_ram_mb=12288 -- match
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs elapsed_time_alt(s)
26091.42 1795.23 124095.18 482% 10699284 273284592 2068807800 26091.445251
Total time: 1.58h:
# Benchmarking command: snakemake --jobs all --rerun-incomplete --printshellcmds --keep-going --use-conda --resources max_download_threads=8 max_io_heavy_threads=6 max_ram_mb=12288 -- map
real(s) sys(s) user(s) percent_CPU max_RAM(kb) FS_inputs FS_outputs elapsed_time_alt(s)
5704.15 2276.19 26218.29 499% 878216 59028240 6016088 5704.174802
8.83h
I think the previous run went fine. When I count the SAM flags of a very old run on the EBI plasmids, I have this SAM flags count:
xzcat output/all_ebi_plasmids.sam_summary.xz | grep -v "^==" | awk '{print $2}' | sort | uniq -c
304
97788 0
96106 16
1381910 2048
1408813 2064
492 4
I think the sum of flags 0 and 16 (~200k) is on the order of aligned_segments 375970
. But we also had different parameters, here I am using --asm20
for minimap2
, whereas before I was using -sr
. Anyway, would you know if these final stats looks ok @karel-brinda ? Or did we maybe introduce a bug?
If this is the plasmid dataset, there must be something wrong with the results.
I'm getting 1,871 aligned queries and 8,980,408 (genome, query) pairs.
The total real time on my rather fast computer is 26.9 hours.
Update: The difference is I take 1000 best hits for alignment.
Update: The difference is I take 1000 best hits for alignment.
Do you have stats for taking 100 best hits?
First run of mof-search querying ARGannot_r3.nonamb against the 661K HQ indexes without any error done! We used 1 thread for COBS and allowed at most 4 decompression jobs to run simultaneously. Performance is not what I expected: it took 9h47m and used 7GB of RAM, but we will improve this very soon.
The main takeaway from this is that we managed to query the 661k HQ indexes with absolutely 0 errors, meaning https://github.com/karel-brinda/mof-search/pull/34 worked (at least on linux)