Quasimapping with multiple threads odd behaviour

iqbal-lab commented 6 years ago

I'm running on a single dedicated server (not shared)

Running with this commit

{
    "version_number": "0.5.0",
    "last_git_commit_hash": "d8a3082a921579e65081fa1932c42c4f2fb7953a",
    "truncated_git_commits": [
        "d8a3082 - Robyn Ffrancon, 1527688551 : enhancement: build command optionally skips building PRG",
        "2dac562 - Robyn Ffrancon, 1527601335 : enhancement: quasimap commands ensures that build command executed successfully",
        "760b759 - Robyn Ffrancon, 1527599820 : enhancement: build stops and returns non-zero if no variants sites found in prg",
        "f3b8cff - Robyn Ffrancon, 1527597315 : enhancment: removed unused skip optimisation code",
        "e22cd4f - Robyn Ffrancon, 1527590325 : fix: SA indexes associated with correct site-allele paths for allele encapsulated mappings"
    ]
}

With k=5, and 8 threads, quasimapping of a fixed fastq to a fixed PRG takes 1 hr 45 mins. Output showed 67 million reads processed

With k=7 and 8 threads, it is still running 15 hours after starting, output shows it has processed 37 million (last print out was 2 hours ago)

Machine: ebi7-017 Command gramtools quasimap --gram-directory results/gramk7 --reads fastq/out.fq.gz --max-threads=8 2>> error_qmapk7_thread8 1>> output_qmapk7_thread8

pwd /tmp/benchmarking

iqbal-lab commented 6 years ago

This process eventually finished after 25 hours

k7 (8 thread) results Count all reads: 67115296 Count skipped reads: 630240 Count mapped reads: 185418

Timer report: seconds Load data 38.42 Quasimap 512413 Total elapsed time: 512452

For comparison, k5 results (8 threads)

Count all reads: 67115296 Count skipped reads: 630240 Count mapped reads: 76199

Timer report: seconds Load data 37.32 Quasimap 29117.6

Different number of mapped reads surprised me a bit

iqbal-lab commented 6 years ago

I've restarted a new run, to see if it is reproducible

iqbal-lab commented 6 years ago

Also seen here https://github.com/iqbal-lab-org/gramtools/issues/117 going from k5 to 10, time taken goes from 1.75 to about 7 hours (8 threads) . See bottom tables in main description of the issue.

iqbal-lab commented 5 years ago

With k11, using the standard Plasmodium PRG

/nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/gram_k11/

Quasimap 4.7 million 150bp reads with 1 thread takes 187,000 seconds (40 hours).

Quasimapping with 8 threads takes 45,000 seconds (12.5 hours).

Maybe this can be closed

ffranr commented 5 years ago

@iqbal-lab Was the number of mapped reads consistent between runs?

iqbal-lab commented 5 years ago

Precisely the same!

ffranr commented 5 years ago

@iqbal-lab In a previous comment you showed that the number of mapped reads was erroneously inconsistent between kmer sizes (when using multiple threads?). Do we know if that issue persists?

iqbal-lab commented 5 years ago

I am now running at k13, will confirm

ffranr commented 5 years ago

With k11, using the standard Plasmodium PRG /nfs/research1/zi/projects/gramtools/standard_datasets/pfalciparum/pf3k_release3_cortex_plus_dblmsps/gram_k11/ Quasimap 4.7 million 150bp reads with 1 thread takes 187,000 seconds (40 hours). Quasimapping with 8 threads takes 45,000 seconds (12.5 hours). Maybe this can be closed

@iqbal-lab Where can I find the quasimap output directory for the above please?

iqbal-lab commented 5 years ago

/nfs/research1/zi/zi/analysis/2018/0920_test_gramtools_for_leffler/quasimapk11

bricoletc commented 5 years ago

Tested multi-threading of quasimap by mapping 250,000 reads (so 500,000 with the reverse complements) from (yoda) /nfs/leia/research/iqbal/bletcher/Pf_benchmark/all_reads/original/PG0496-C.bam to (big) pf3k prg.

Results: Threading_Pf Time in seconds.

So here multi-threading gives no speedup.

All runs produced consistent results for reads mapped:

Count all reads: 500000 Count skipped reads: 190 Count mapped reads: 204169

iqbal-lab commented 5 years ago

Can you give exact command line and lsf command?

bricoletc commented 5 years ago

Can you give exact command line and lsf command?

bsub -R select[mem>60000] rusage[mem=60000] -M60000 -J threads_4 -n 4 -o /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/logs/t4_k8.o \
-e /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/logs/t4_k8.e singularity exec /nfs/leia/research/iqbal/bletcher/Singularity_Images/8b46a86_gramtools.img gramtools quasimap \
--gram-dir /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/gram_k8 --run-dir /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/run_t4 \
--reads /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/subsetted_reads/PG0496-C.trim.fq.1.1MSubset.gz --max-threads 4

bricoletc commented 5 years ago

UPDATE: The previous plot shows total CPU time and not elapsed real (wall clock) time.
Looking at wall clock time we see multi-threading really is working: Wall_clock_mthreads

Of interest, k=8 achieves a 4.1-fold speedup compared to only 2.7-fold speedup on k=11, going from 1 thread to 10 threads.

leoisl commented 5 years ago

Hello!

Nice! Thanks for this! If it is not too much to ask, could you also make an additional plot please? I think one showing the true speed-up and the theoretical best speed-up (see for e.g. https://stackoverflow.com/questions/26514264/plot-speed-up-curve-vs-number-of-openmp-threads-scalability ) can be useful for us to know if we should try to improve multithreading or not!

Thanks!

Cheers!

iqbal-lab-org / gramtools

Quasimapping with multiple threads odd behaviour #127