tkf / ThreadsX.jl

Parallelized Base functions
MIT License
321 stars 10 forks source link

Improve README #110

Closed tkf closed 4 years ago

github-actions[bot] commented 4 years ago
Benchmark result # Judge result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmarks: - Target: 28 Jun 2020 - 04:01 - Baseline: 28 Jun 2020 - 04:07 * Package commits: - Target: 511b9d - Baseline: a573a0 * Julia commits: - Target: 44fa15 - Baseline: 44fa15 * Julia command flags: - Target: None - Baseline: None * Environment variables: - Target: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` - Baseline: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results A ratio greater than `1.0` denotes a possible regression (marked with :x:), while a ratio less than `1.0` denotes a possible improvement (marked with :white_check_mark:). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds). | ID | time ratio | memory ratio | |--------------------------------------------------------------------|------------------------------|------------------------------| | `["findfirst", "10%", "base"]` | 1.15 (5%) :x: | 1.00 (1%) | | `["findfirst", "10%", "tx"]` | 0.76 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "10%", "tx-noterm"]` | 1.02 (5%) | 0.86 (1%) :white_check_mark: | | `["findfirst", "10%", "tx-seq"]` | 1.30 (5%) :x: | 1.00 (1%) | | `["findfirst", "20%", "tx"]` | 0.76 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "20%", "tx-noterm"]` | 0.77 (5%) :white_check_mark: | 1.09 (1%) :x: | | `["findfirst", "20%", "tx-seq"]` | 1.44 (5%) :x: | 1.00 (1%) | | `["findfirst", "30%", "base"]` | 0.94 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "30%", "tx"]` | 0.74 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "30%", "tx-noterm"]` | 0.74 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "30%", "tx-seq"]` | 1.50 (5%) :x: | 1.00 (1%) | | `["findfirst", "40%", "tx"]` | 0.79 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "40%", "tx-noterm"]` | 0.73 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "40%", "tx-seq"]` | 1.50 (5%) :x: | 1.00 (1%) | | `["findfirst", "50%", "tx"]` | 0.74 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "50%", "tx-noterm"]` | 0.73 (5%) :white_check_mark: | 0.87 (1%) :white_check_mark: | | `["findfirst", "50%", "tx-seq"]` | 1.50 (5%) :x: | 1.00 (1%) | | `["foreach_seq", "base", "Matrix"]` | 1.56 (5%) :x: | 1.00 (1%) | | `["foreach_seq", "base", "Transpose"]` | 1.09 (5%) :x: | 1.00 (1%) | | `["foreach_seq", "base", "Vector"]` | 1.07 (5%) :x: | 1.00 (1%) | | `["foreach_seq", "tx", "Matrix"]` | 1.07 (5%) :x: | 1.00 (1%) | | `["foreach_seq", "tx", "Transpose"]` | 1.15 (5%) :x: | 1.00 (1%) | | `["foreach_seq", "tx", "Vector"]` | 0.78 (5%) :white_check_mark: | 1.00 (1%) | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 1.05 (5%) :x: | 1.00 (1%) | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 56825.87 (5%) :x: | 1.00 (1%) | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 58676.53 (5%) :x: | 1.00 (1%) | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 56711.38 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.20 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.19 (5%) :x: | 1.00 (1%) | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 1.06 (5%) :x: | 1.00 (1%) | | `["sort", "I64 (narrow)", "Base"]` | 1.16 (5%) :x: | 1.00 (1%) | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 1.12 (5%) :x: | 1.00 (1%) | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 1.08 (5%) :x: | 1.00 (1%) | | `["sort", "reversed", "Base"]` | 1.09 (5%) :x: | 1.00 (1%) | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ### Target ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 43878 s 0 s 2579 s 122082 s 0 s #2 2095 MHz 65684 s 0 s 3428 s 100687 s 0 s Memory: 6.764884948730469 GB (1971.859375 MB free) Uptime: 1713.0 sec Load Avg: 1.27197265625 1.287109375 0.8310546875 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` ### Baseline ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 68321 s 0 s 3434 s 132980 s 0 s #2 2095 MHz 89355 s 0 s 3973 s 112625 s 0 s Memory: 6.764884948730469 GB (2375.41015625 MB free) Uptime: 2077.0 sec Load Avg: 1.287109375 1.3330078125 1.0068359375 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` --- # Target result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 4:1 * Package commit: 511b9d * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 3.000 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 26.802 μs (5%) | | 11.95 KiB (1%) | 218 | | `["findfirst", "0%", "tx-noterm"]` | 24.601 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 242.804 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 116.808 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 76.606 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 220.115 μs (5%) | | 28.22 KiB (1%) | 516 | | `["findfirst", "10%", "tx-seq"]` | 102.007 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 233.616 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 142.410 μs (5%) | | 21.34 KiB (1%) | 394 | | `["findfirst", "20%", "tx-noterm"]` | 208.514 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "20%", "tx-seq"]` | 226.615 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 331.023 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 197.614 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 203.214 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "30%", "tx-seq"]` | 351.125 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 463.332 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 278.319 μs (5%) | | 35.33 KiB (1%) | 652 | | `["findfirst", "40%", "tx-noterm"]` | 268.018 μs (5%) | | 35.31 KiB (1%) | 650 | | `["findfirst", "40%", "tx-seq"]` | 467.832 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 584.041 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 309.822 μs (5%) | | 37.70 KiB (1%) | 698 | | `["findfirst", "50%", "tx-noterm"]` | 354.524 μs (5%) | | 46.94 KiB (1%) | 867 | | `["findfirst", "50%", "tx-seq"]` | 584.841 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 307.686 ms (5%) | 27.325 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 250.088 ms (5%) | 27.902 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 9.224 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 6.778 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 4.408 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 3.466 ms (5%) | | 12.77 KiB (1%) | 125 | | `["foreach_seq", "base", "Matrix"]` | 1.162 ms (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 2.469 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 1.114 ms (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 839.551 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 1.267 ms (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 862.053 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 24.401 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 24.401 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 25.202 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 24.801 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 61.245 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 56.826 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 58.677 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 56.711 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.200 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.190 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 3.075 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 3.075 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 3.376 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 3.380 ms (5%) | | 1.19 MiB (1%) | 535 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 698.145 μs (5%) | | 965.09 KiB (1%) | 1225 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 699.045 μs (5%) | | 1.02 MiB (1%) | 1247 | | `["sort", "F64 (wide)", "Base"]` | 8.409 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 6.686 ms (5%) | | 1.19 MiB (1%) | 563 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 4.428 ms (5%) | | 1.01 MiB (1%) | 2149 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 5.268 ms (5%) | | 1.39 MiB (1%) | 2195 | | `["sort", "I64 (narrow)", "Base"]` | 149.109 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 123.107 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 136.609 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 134.209 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 8.638 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 5.856 ms (5%) | | 1.19 MiB (1%) | 554 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 4.082 ms (5%) | | 1.01 MiB (1%) | 2236 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 4.983 ms (5%) | | 1.40 MiB (1%) | 2273 | | `["sort", "reversed", "Base"]` | 1.025 ms (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.474 ms (5%) | | 1.18 MiB (1%) | 435 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 1.082 ms (5%) | | 998.77 KiB (1%) | 1872 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.594 ms (5%) | | 1.36 MiB (1%) | 1902 | | `["sort", "sorted", "Base"]` | 1.009 ms (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 1.096 ms (5%) | | 1.18 MiB (1%) | 431 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 1.026 ms (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 1.298 ms (5%) | | 1.36 MiB (1%) | 1902 | | `["unique", "rand(1:10, 1000000)", "base"]` | 10.545 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 5.995 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 9.550 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 6.034 ms (5%) | | 1.07 MiB (1%) | 1186 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 43878 s 0 s 2579 s 122082 s 0 s #2 2095 MHz 65684 s 0 s 3428 s 100687 s 0 s Memory: 6.764884948730469 GB (1971.859375 MB free) Uptime: 1713.0 sec Load Avg: 1.27197265625 1.287109375 0.8310546875 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` --- # Baseline result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 4:7 * Package commit: a573a0 * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 3.000 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 26.001 μs (5%) | | 11.97 KiB (1%) | 219 | | `["findfirst", "0%", "tx-noterm"]` | 23.902 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 242.802 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 101.806 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 101.106 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 216.312 μs (5%) | | 32.95 KiB (1%) | 609 | | `["findfirst", "10%", "tx-seq"]` | 78.404 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 233.714 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 188.412 μs (5%) | | 21.33 KiB (1%) | 393 | | `["findfirst", "20%", "tx-noterm"]` | 269.116 μs (5%) | | 25.95 KiB (1%) | 477 | | `["findfirst", "20%", "tx-seq"]` | 157.309 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 350.421 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 267.216 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 274.817 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "30%", "tx-seq"]` | 234.614 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 467.227 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 353.021 μs (5%) | | 35.31 KiB (1%) | 651 | | `["findfirst", "40%", "tx-noterm"]` | 365.022 μs (5%) | | 35.30 KiB (1%) | 649 | | `["findfirst", "40%", "tx-seq"]` | 312.718 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 584.435 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 416.825 μs (5%) | | 37.70 KiB (1%) | 698 | | `["findfirst", "50%", "tx-noterm"]` | 485.429 μs (5%) | | 53.86 KiB (1%) | 992 | | `["findfirst", "50%", "tx-seq"]` | 390.423 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 317.565 ms (5%) | 35.622 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 256.495 ms (5%) | 35.639 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 8.791 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 6.897 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 4.490 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 3.403 ms (5%) | | 12.75 KiB (1%) | 124 | | `["foreach_seq", "base", "Matrix"]` | 746.581 μs (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 2.266 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 1.036 ms (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 784.484 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 1.100 ms (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 1.109 ms (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 24.402 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 23.403 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 23.903 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 23.802 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 58.399 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 0.001 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 0.001 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 0.001 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.000 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.000 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 3.000 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 3.000 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 3.397 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 3.273 ms (5%) | | 1.19 MiB (1%) | 534 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 671.739 μs (5%) | | 965.06 KiB (1%) | 1223 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 678.839 μs (5%) | | 1.02 MiB (1%) | 1248 | | `["sort", "F64 (wide)", "Base"]` | 8.294 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 6.283 ms (5%) | | 1.19 MiB (1%) | 563 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 4.415 ms (5%) | | 1.01 MiB (1%) | 2147 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 5.133 ms (5%) | | 1.39 MiB (1%) | 2197 | | `["sort", "I64 (narrow)", "Base"]` | 128.207 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 123.607 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 121.808 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 124.307 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 8.404 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 5.632 ms (5%) | | 1.19 MiB (1%) | 553 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 3.978 ms (5%) | | 1.01 MiB (1%) | 2236 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 4.848 ms (5%) | | 1.40 MiB (1%) | 2270 | | `["sort", "reversed", "Base"]` | 939.754 μs (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.455 ms (5%) | | 1.18 MiB (1%) | 435 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 1.104 ms (5%) | | 998.73 KiB (1%) | 1870 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.553 ms (5%) | | 1.36 MiB (1%) | 1903 | | `["sort", "sorted", "Base"]` | 1.008 ms (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 1.089 ms (5%) | | 1.18 MiB (1%) | 431 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 1.052 ms (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 1.289 ms (5%) | | 1.36 MiB (1%) | 1902 | | `["unique", "rand(1:10, 1000000)", "base"]` | 10.534 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 5.770 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 9.461 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 6.083 ms (5%) | | 1.07 MiB (1%) | 1186 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 68321 s 0 s 3434 s 132980 s 0 s #2 2095 MHz 89355 s 0 s 3973 s 112625 s 0 s Memory: 6.764884948730469 GB (2375.41015625 MB free) Uptime: 2077.0 sec Load Avg: 1.287109375 1.3330078125 1.0068359375 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` --- # Runtime information | Runtime Info | | |:--|:--| | BLAS #threads | 2 | | `BLAS.vendor()` | `openblas64` | | `Sys.CPU_THREADS` | 2 | `lscpu` output: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz Stepping: 4 CPU MHz: 2095.231 BogoMIPS: 4190.46 Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear | Cpu Property | Value | |:------------------ |:------------------------------------------------------- | | Brand | Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz | | Vendor | :Intel | | Architecture | :Skylake | | Model | Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00 | | Cores | 2 physical cores, 2 logical cores (on executing CPU) | | | No Hyperthreading detected | | Clock Frequencies | Not supported by CPU | | Data Cache | Level 1:3 : (32, 1024, 36608) kbytes | | | 64 byte cache line size | | Address Size | 48 bits virtual, 44 bits physical | | SIMD | 512 bit = 64 byte max. SIMD vector size | | Time Stamp Counter | TSC is accessible via `rdtsc` | | | TSC increased at every clock cycle (non-invariant TSC) | | Perf. Monitoring | Performance Monitoring Counters (PMC) are not supported | | Hypervisor | Yes, Microsoft |
tkf commented 4 years ago

@mergifyio rebase

mergify[bot] commented 4 years ago

Command rebase: success

Branch has been successfully rebased

codecov[bot] commented 4 years ago

Codecov Report

Merging #110 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #110   +/-   ##
=======================================
  Coverage   78.88%   78.88%           
=======================================
  Files           8        8           
  Lines         412      412           
=======================================
  Hits          325      325           
  Misses         87       87           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ab83e05...af0625e. Read the comment docs.

github-actions[bot] commented 4 years ago
Benchmark result # Judge result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmarks: - Target: 28 Jun 2020 - 10:03 - Baseline: 28 Jun 2020 - 10:08 * Package commits: - Target: 51c447 - Baseline: 3dc611 * Julia commits: - Target: 44fa15 - Baseline: 44fa15 * Julia command flags: - Target: None - Baseline: None * Environment variables: - Target: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` - Baseline: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results A ratio greater than `1.0` denotes a possible regression (marked with :x:), while a ratio less than `1.0` denotes a possible improvement (marked with :white_check_mark:). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds). | ID | time ratio | memory ratio | |--------------------------------------------------------------------|------------------------------|------------------------------| | `["findfirst", "0%", "tx-noterm"]` | 1.16 (5%) :x: | 1.00 (1%) | | `["findfirst", "0%", "tx-seq"]` | 1.15 (5%) :x: | 1.00 (1%) | | `["findfirst", "10%", "base"]` | 1.15 (5%) :x: | 1.00 (1%) | | `["findfirst", "10%", "tx"]` | 0.76 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "10%", "tx-noterm"]` | 0.81 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "10%", "tx-seq"]` | 1.72 (5%) :x: | 1.00 (1%) | | `["findfirst", "20%", "base"]` | 1.32 (5%) :x: | 1.00 (1%) | | `["findfirst", "20%", "tx"]` | 0.87 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "20%", "tx-noterm"]` | 0.76 (5%) :white_check_mark: | 1.17 (1%) :x: | | `["findfirst", "20%", "tx-seq"]` | 1.49 (5%) :x: | 1.00 (1%) | | `["findfirst", "30%", "base"]` | 1.37 (5%) :x: | 1.00 (1%) | | `["findfirst", "30%", "tx"]` | 0.77 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "30%", "tx-noterm"]` | 0.76 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "30%", "tx-seq"]` | 1.49 (5%) :x: | 1.00 (1%) | | `["findfirst", "40%", "base"]` | 1.60 (5%) :x: | 1.00 (1%) | | `["findfirst", "40%", "tx"]` | 0.75 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "40%", "tx-noterm"]` | 0.79 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "40%", "tx-seq"]` | 1.49 (5%) :x: | 1.00 (1%) | | `["findfirst", "50%", "base"]` | 1.50 (5%) :x: | 1.00 (1%) | | `["findfirst", "50%", "tx"]` | 0.78 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "50%", "tx-noterm"]` | 0.75 (5%) :white_check_mark: | 0.96 (1%) :white_check_mark: | | `["findfirst", "50%", "tx-seq"]` | 1.50 (5%) :x: | 1.00 (1%) | | `["foreach", "base", "A .= B .+ B'"]` | 0.92 (5%) :white_check_mark: | 1.00 (1%) | | `["foreach", "base", "A .= B .+ C"]` | 0.93 (5%) :white_check_mark: | 1.00 (1%) | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 0.93 (5%) :white_check_mark: | 1.00 (1%) | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 43669.71 (5%) :x: | 1.00 (1%) | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 45189.46 (5%) :x: | 1.00 (1%) | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 43701.22 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.47 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.50 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 1.16 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 1.34 (5%) :x: | 1.00 (1%) | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 1.10 (5%) :x: | 1.00 (1%) | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 1.07 (5%) :x: | 1.00 (1%) | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 1.07 (5%) :x: | 1.00 (1%) | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 1.06 (5%) :x: | 1.00 (1%) | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 1.07 (5%) :x: | 1.00 (1%) | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ### Target ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 41537 s 0 s 2662 s 38193 s 0 s #2 2095 MHz 63441 s 0 s 2878 s 17072 s 0 s Memory: 6.764884948730469 GB (2122.91796875 MB free) Uptime: 850.0 sec Load Avg: 1.2861328125 1.33349609375 0.9052734375 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` ### Baseline ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 68997 s 0 s 3532 s 43910 s 0 s #2 2095 MHz 81617 s 0 s 3440 s 32417 s 0 s Memory: 6.764884948730469 GB (2396.69140625 MB free) Uptime: 1193.0 sec Load Avg: 1.25634765625 1.3369140625 1.04443359375 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` --- # Target result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 10:3 * Package commit: 51c447 * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 2.600 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 24.602 μs (5%) | | 11.95 KiB (1%) | 218 | | `["findfirst", "0%", "tx-noterm"]` | 21.602 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 208.307 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 67.906 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 67.005 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 190.715 μs (5%) | | 35.16 KiB (1%) | 645 | | `["findfirst", "10%", "tx-seq"]` | 102.008 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 157.212 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 120.210 μs (5%) | | 21.34 KiB (1%) | 394 | | `["findfirst", "20%", "tx-noterm"]` | 185.814 μs (5%) | | 32.97 KiB (1%) | 607 | | `["findfirst", "20%", "tx-seq"]` | 175.813 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 280.628 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 173.516 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 193.918 μs (5%) | | 28.31 KiB (1%) | 522 | | `["findfirst", "30%", "tx-seq"]` | 263.425 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 378.732 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 227.719 μs (5%) | | 35.28 KiB (1%) | 649 | | `["findfirst", "40%", "tx-noterm"]` | 249.720 μs (5%) | | 35.31 KiB (1%) | 650 | | `["findfirst", "40%", "tx-seq"]` | 350.929 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 440.541 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 268.524 μs (5%) | | 37.70 KiB (1%) | 698 | | `["findfirst", "50%", "tx-noterm"]` | 294.426 μs (5%) | | 51.56 KiB (1%) | 950 | | `["findfirst", "50%", "tx-seq"]` | 438.540 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 263.682 ms (5%) | 22.173 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 200.133 ms (5%) | 21.525 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 7.487 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 6.285 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 4.059 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 3.272 ms (5%) | | 12.73 KiB (1%) | 123 | | `["foreach_seq", "base", "Matrix"]` | 625.834 μs (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 1.790 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 625.833 μs (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 629.234 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 1.030 ms (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 621.833 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 17.101 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 17.201 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 18.501 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 17.101 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 43.701 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 43.670 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 45.189 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 43.701 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.030 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.200 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 2.678 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 3.078 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 2.128 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 2.646 ms (5%) | | 1.19 MiB (1%) | 534 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 566.735 μs (5%) | | 965.13 KiB (1%) | 1227 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 598.135 μs (5%) | | 1.02 MiB (1%) | 1247 | | `["sort", "F64 (wide)", "Base"]` | 5.675 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 5.104 ms (5%) | | 1.19 MiB (1%) | 564 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 3.444 ms (5%) | | 1.01 MiB (1%) | 2147 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 3.891 ms (5%) | | 1.39 MiB (1%) | 2196 | | `["sort", "I64 (narrow)", "Base"]` | 112.607 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 112.607 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 109.207 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 110.907 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 5.494 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 4.292 ms (5%) | | 1.19 MiB (1%) | 554 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 3.258 ms (5%) | | 1.01 MiB (1%) | 2237 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 3.636 ms (5%) | | 1.40 MiB (1%) | 2271 | | `["sort", "reversed", "Base"]` | 610.136 μs (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.125 ms (5%) | | 1.18 MiB (1%) | 435 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 935.855 μs (5%) | | 998.77 KiB (1%) | 1872 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.270 ms (5%) | | 1.36 MiB (1%) | 1902 | | `["sort", "sorted", "Base"]` | 564.733 μs (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 819.646 μs (5%) | | 1.18 MiB (1%) | 431 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 895.551 μs (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 1.005 ms (5%) | | 1.36 MiB (1%) | 1900 | | `["unique", "rand(1:10, 1000000)", "base"]` | 8.014 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 4.739 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 8.248 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 5.260 ms (5%) | | 1.07 MiB (1%) | 1186 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 41537 s 0 s 2662 s 38193 s 0 s #2 2095 MHz 63441 s 0 s 2878 s 17072 s 0 s Memory: 6.764884948730469 GB (2122.91796875 MB free) Uptime: 850.0 sec Load Avg: 1.2861328125 1.33349609375 0.9052734375 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` --- # Baseline result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 10:8 * Package commit: 3dc611 * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 2.600 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 23.901 μs (5%) | | 11.95 KiB (1%) | 218 | | `["findfirst", "0%", "tx-noterm"]` | 18.601 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 181.580 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 59.303 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 88.404 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 234.711 μs (5%) | | 35.20 KiB (1%) | 648 | | `["findfirst", "10%", "tx-seq"]` | 59.203 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 118.906 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 138.907 μs (5%) | | 21.33 KiB (1%) | 393 | | `["findfirst", "20%", "tx-noterm"]` | 244.312 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "20%", "tx-seq"]` | 118.305 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 205.111 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 224.912 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 255.313 μs (5%) | | 28.31 KiB (1%) | 522 | | `["findfirst", "30%", "tx-seq"]` | 176.309 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 236.112 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 305.615 μs (5%) | | 35.31 KiB (1%) | 651 | | `["findfirst", "40%", "tx-noterm"]` | 315.916 μs (5%) | | 35.34 KiB (1%) | 652 | | `["findfirst", "40%", "tx-seq"]` | 235.011 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 294.615 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 342.417 μs (5%) | | 37.72 KiB (1%) | 699 | | `["findfirst", "50%", "tx-noterm"]` | 394.820 μs (5%) | | 53.91 KiB (1%) | 993 | | `["findfirst", "50%", "tx-seq"]` | 293.115 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 285.799 ms (5%) | 28.610 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 215.480 ms (5%) | 29.859 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 7.229 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 6.307 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 3.890 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 3.277 ms (5%) | | 12.77 KiB (1%) | 125 | | `["foreach_seq", "base", "Matrix"]` | 625.757 μs (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 1.764 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 625.756 μs (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 629.156 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 1.020 ms (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 625.655 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 17.401 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 17.801 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 18.001 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 18.301 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 44.313 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 0.001 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 0.001 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 0.001 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 700.000 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 800.000 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 2.300 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 2.300 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 2.126 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 2.401 ms (5%) | | 1.19 MiB (1%) | 534 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 564.560 μs (5%) | | 965.13 KiB (1%) | 1227 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 574.660 μs (5%) | | 1.02 MiB (1%) | 1246 | | `["sort", "F64 (wide)", "Base"]` | 5.406 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 5.035 ms (5%) | | 1.19 MiB (1%) | 563 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 3.409 ms (5%) | | 1.01 MiB (1%) | 2148 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 4.088 ms (5%) | | 1.39 MiB (1%) | 2197 | | `["sort", "I64 (narrow)", "Base"]` | 114.012 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 105.411 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 102.411 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 107.812 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 5.714 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 4.222 ms (5%) | | 1.19 MiB (1%) | 554 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 3.078 ms (5%) | | 1.01 MiB (1%) | 2236 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 3.699 ms (5%) | | 1.40 MiB (1%) | 2271 | | `["sort", "reversed", "Base"]` | 593.461 μs (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.132 ms (5%) | | 1.18 MiB (1%) | 435 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 910.892 μs (5%) | | 998.73 KiB (1%) | 1870 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.268 ms (5%) | | 1.36 MiB (1%) | 1903 | | `["sort", "sorted", "Base"]` | 564.256 μs (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 819.580 μs (5%) | | 1.18 MiB (1%) | 432 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 887.987 μs (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 993.995 μs (5%) | | 1.36 MiB (1%) | 1904 | | `["unique", "rand(1:10, 1000000)", "base"]` | 8.416 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 4.745 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 8.488 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 4.925 ms (5%) | | 1.07 MiB (1%) | 1186 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: speed user nice sys idle irq #1 2095 MHz 68997 s 0 s 3532 s 43910 s 0 s #2 2095 MHz 81617 s 0 s 3440 s 32417 s 0 s Memory: 6.764884948730469 GB (2396.69140625 MB free) Uptime: 1193.0 sec Load Avg: 1.25634765625 1.3369140625 1.04443359375 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) ``` --- # Runtime information | Runtime Info | | |:--|:--| | BLAS #threads | 2 | | `BLAS.vendor()` | `openblas64` | | `Sys.CPU_THREADS` | 2 | `lscpu` output: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz Stepping: 4 CPU MHz: 2095.232 BogoMIPS: 4190.46 Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves | Cpu Property | Value | |:------------------ |:------------------------------------------------------- | | Brand | Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz | | Vendor | :Intel | | Architecture | :Skylake | | Model | Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00 | | Cores | 2 physical cores, 2 logical cores (on executing CPU) | | | No Hyperthreading detected | | Clock Frequencies | Not supported by CPU | | Data Cache | Level 1:3 : (32, 1024, 36608) kbytes | | | 64 byte cache line size | | Address Size | 48 bits virtual, 44 bits physical | | SIMD | 512 bit = 64 byte max. SIMD vector size | | Time Stamp Counter | TSC is accessible via `rdtsc` | | | TSC increased at every clock cycle (non-invariant TSC) | | Perf. Monitoring | Performance Monitoring Counters (PMC) are not supported | | Hypervisor | Yes, Microsoft |
github-actions[bot] commented 4 years ago
Benchmark result # Judge result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmarks: - Target: 28 Jun 2020 - 10:11 - Baseline: 28 Jun 2020 - 10:17 * Package commits: - Target: 23ac98 - Baseline: ab83e0 * Julia commits: - Target: 44fa15 - Baseline: 44fa15 * Julia command flags: - Target: None - Baseline: None * Environment variables: - Target: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` - Baseline: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results A ratio greater than `1.0` denotes a possible regression (marked with :x:), while a ratio less than `1.0` denotes a possible improvement (marked with :white_check_mark:). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds). | ID | time ratio | memory ratio | |--------------------------------------------------------------------|------------------------------|---------------| | `["findfirst", "0%", "base"]` | 1.09 (5%) :x: | 1.00 (1%) | | `["findfirst", "10%", "tx-noterm"]` | 0.97 (5%) | 1.39 (1%) :x: | | `["findfirst", "30%", "tx"]` | 0.91 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "50%", "tx"]` | 0.91 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "50%", "tx-noterm"]` | 0.98 (5%) | 1.43 (1%) :x: | | `["foreach_seq_double", "cartesian", "man"]` | 1.14 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.36 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.28 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 1.10 (5%) :x: | 1.00 (1%) | | `["unique", "rand(1:1000, 1000000)", "base"]` | 0.93 (5%) :white_check_mark: | 1.00 (1%) | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ### Target ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2394 MHz 53244 s 0 s 2564 s 34465 s 0 s #2 2394 MHz 58194 s 0 s 3121 s 29230 s 0 s Memory: 6.764884948730469 GB (2010.53515625 MB free) Uptime: 926.0 sec Load Avg: 1.29833984375 1.3583984375 0.94775390625 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` ### Baseline ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2394 MHz 74139 s 0 s 3236 s 48646 s 0 s #2 2394 MHz 84590 s 0 s 3885 s 37756 s 0 s Memory: 6.764884948730469 GB (2374.5234375 MB free) Uptime: 1286.0 sec Load Avg: 1.33544921875 1.42431640625 1.11279296875 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` --- # Target result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 10:11 * Package commit: 23ac98 * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 3.500 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 27.101 μs (5%) | | 11.97 KiB (1%) | 219 | | `["findfirst", "0%", "tx-noterm"]` | 23.001 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 256.235 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 68.500 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 82.001 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 205.200 μs (5%) | | 32.88 KiB (1%) | 601 | | `["findfirst", "10%", "tx-seq"]` | 68.500 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 136.300 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 138.601 μs (5%) | | 21.33 KiB (1%) | 393 | | `["findfirst", "20%", "tx-noterm"]` | 195.800 μs (5%) | | 28.31 KiB (1%) | 522 | | `["findfirst", "20%", "tx-seq"]` | 136.400 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 204.101 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 185.100 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 196.001 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "30%", "tx-seq"]` | 204.200 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 272.000 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 259.601 μs (5%) | | 35.31 KiB (1%) | 651 | | `["findfirst", "40%", "tx-noterm"]` | 265.401 μs (5%) | | 35.31 KiB (1%) | 650 | | `["findfirst", "40%", "tx-seq"]` | 272.001 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 339.701 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 286.201 μs (5%) | | 37.70 KiB (1%) | 698 | | `["findfirst", "50%", "tx-noterm"]` | 342.401 μs (5%) | | 53.86 KiB (1%) | 992 | | `["findfirst", "50%", "tx-seq"]` | 339.801 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 401.774 ms (5%) | 43.046 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 249.934 ms (5%) | 28.315 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 18.505 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 7.815 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 9.983 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 6.735 ms (5%) | | 12.75 KiB (1%) | 124 | | `["foreach_seq", "base", "Matrix"]` | 651.601 μs (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 2.311 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 651.602 μs (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 656.301 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 1.018 ms (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 651.401 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 26.100 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 23.200 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 22.900 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 23.100 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 100.955 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 100.000 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 100.321 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 100.968 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 2.044 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 2.044 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 3.200 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 2.987 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 2.523 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 2.967 ms (5%) | | 1.19 MiB (1%) | 534 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 1.756 ms (5%) | | 965.11 KiB (1%) | 1226 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 1.756 ms (5%) | | 1.02 MiB (1%) | 1246 | | `["sort", "F64 (wide)", "Base"]` | 6.126 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 5.507 ms (5%) | | 1.19 MiB (1%) | 563 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 5.326 ms (5%) | | 1.01 MiB (1%) | 2147 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 6.247 ms (5%) | | 1.39 MiB (1%) | 2191 | | `["sort", "I64 (narrow)", "Base"]` | 143.901 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 147.800 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 150.601 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 147.201 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 6.197 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 4.724 ms (5%) | | 1.19 MiB (1%) | 554 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 4.483 ms (5%) | | 1.01 MiB (1%) | 2238 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 5.272 ms (5%) | | 1.40 MiB (1%) | 2269 | | `["sort", "reversed", "Base"]` | 769.802 μs (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.283 ms (5%) | | 1.18 MiB (1%) | 434 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 1.223 ms (5%) | | 998.73 KiB (1%) | 1870 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.669 ms (5%) | | 1.36 MiB (1%) | 1904 | | `["sort", "sorted", "Base"]` | 722.002 μs (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 958.902 μs (5%) | | 1.18 MiB (1%) | 430 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 1.242 ms (5%) | | 998.78 KiB (1%) | 1873 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 1.390 ms (5%) | | 1.36 MiB (1%) | 1903 | | `["unique", "rand(1:10, 1000000)", "base"]` | 10.261 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 5.418 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 9.119 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 5.821 ms (5%) | | 1.07 MiB (1%) | 1185 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2394 MHz 53244 s 0 s 2564 s 34465 s 0 s #2 2394 MHz 58194 s 0 s 3121 s 29230 s 0 s Memory: 6.764884948730469 GB (2010.53515625 MB free) Uptime: 926.0 sec Load Avg: 1.29833984375 1.3583984375 0.94775390625 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` --- # Baseline result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 10:17 * Package commit: ab83e0 * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 3.200 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 26.701 μs (5%) | | 11.95 KiB (1%) | 218 | | `["findfirst", "0%", "tx-noterm"]` | 22.800 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 256.238 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 68.600 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 78.300 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 211.301 μs (5%) | | 23.66 KiB (1%) | 436 | | `["findfirst", "10%", "tx-seq"]` | 68.700 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 136.301 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 140.800 μs (5%) | | 21.33 KiB (1%) | 393 | | `["findfirst", "20%", "tx-noterm"]` | 199.002 μs (5%) | | 28.28 KiB (1%) | 520 | | `["findfirst", "20%", "tx-seq"]` | 136.401 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 204.101 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 204.201 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 201.901 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "30%", "tx-seq"]` | 204.501 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 272.201 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 266.301 μs (5%) | | 35.31 KiB (1%) | 651 | | `["findfirst", "40%", "tx-noterm"]` | 253.101 μs (5%) | | 35.33 KiB (1%) | 651 | | `["findfirst", "40%", "tx-seq"]` | 272.101 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 339.502 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 314.002 μs (5%) | | 37.69 KiB (1%) | 697 | | `["findfirst", "50%", "tx-noterm"]` | 348.703 μs (5%) | | 37.77 KiB (1%) | 701 | | `["findfirst", "50%", "tx-seq"]` | 340.002 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 419.291 ms (5%) | 36.913 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 255.709 ms (5%) | 38.286 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 18.922 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 7.938 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 9.841 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 6.746 ms (5%) | | 12.75 KiB (1%) | 124 | | `["foreach_seq", "base", "Matrix"]` | 651.803 μs (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 2.313 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 651.603 μs (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 667.603 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 1.006 ms (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 651.403 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 22.900 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 23.300 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 22.900 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 23.300 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 97.137 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 100.000 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 100.000 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 100.000 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.500 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.600 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 2.900 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 2.900 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 2.525 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 2.864 ms (5%) | | 1.19 MiB (1%) | 535 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 1.745 ms (5%) | | 965.09 KiB (1%) | 1225 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 1.785 ms (5%) | | 1.02 MiB (1%) | 1246 | | `["sort", "F64 (wide)", "Base"]` | 6.400 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 5.416 ms (5%) | | 1.19 MiB (1%) | 563 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 5.372 ms (5%) | | 1.01 MiB (1%) | 2145 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 6.215 ms (5%) | | 1.39 MiB (1%) | 2194 | | `["sort", "I64 (narrow)", "Base"]` | 144.301 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 145.300 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 145.201 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 145.401 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 6.335 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 4.598 ms (5%) | | 1.19 MiB (1%) | 554 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 4.527 ms (5%) | | 1.01 MiB (1%) | 2236 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 5.265 ms (5%) | | 1.40 MiB (1%) | 2271 | | `["sort", "reversed", "Base"]` | 770.504 μs (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.343 ms (5%) | | 1.18 MiB (1%) | 434 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 1.268 ms (5%) | | 998.72 KiB (1%) | 1869 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.721 ms (5%) | | 1.36 MiB (1%) | 1904 | | `["sort", "sorted", "Base"]` | 726.403 μs (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 956.604 μs (5%) | | 1.18 MiB (1%) | 430 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 1.282 ms (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 1.430 ms (5%) | | 1.36 MiB (1%) | 1904 | | `["unique", "rand(1:10, 1000000)", "base"]` | 10.464 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 5.531 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 9.825 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 5.868 ms (5%) | | 1.07 MiB (1%) | 1186 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2394 MHz 74139 s 0 s 3236 s 48646 s 0 s #2 2394 MHz 84590 s 0 s 3885 s 37756 s 0 s Memory: 6.764884948730469 GB (2374.5234375 MB free) Uptime: 1286.0 sec Load Avg: 1.33544921875 1.42431640625 1.11279296875 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` --- # Runtime information | Runtime Info | | |:--|:--| | BLAS #threads | 2 | | `BLAS.vendor()` | `openblas64` | | `Sys.CPU_THREADS` | 2 | `lscpu` output: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Stepping: 2 CPU MHz: 2394.451 BogoMIPS: 4788.90 Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear | Cpu Property | Value | |:------------------ |:------------------------------------------------------- | | Brand | Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz | | Vendor | :Intel | | Architecture | :Haswell | | Model | Family: 0x06, Model: 0x3f, Stepping: 0x02, Type: 0x00 | | Cores | 2 physical cores, 2 logical cores (on executing CPU) | | | No Hyperthreading detected | | Clock Frequencies | Not supported by CPU | | Data Cache | Level 1:3 : (32, 256, 30720) kbytes | | | 64 byte cache line size | | Address Size | 48 bits virtual, 44 bits physical | | SIMD | 256 bit = 32 byte max. SIMD vector size | | Time Stamp Counter | TSC is accessible via `rdtsc` | | | TSC increased at every clock cycle (non-invariant TSC) | | Perf. Monitoring | Performance Monitoring Counters (PMC) are not supported | | Hypervisor | Yes, Microsoft |
github-actions[bot] commented 4 years ago
Benchmark result # Judge result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmarks: - Target: 28 Jun 2020 - 10:15 - Baseline: 28 Jun 2020 - 10:21 * Package commits: - Target: 182a51 - Baseline: ab83e0 * Julia commits: - Target: 44fa15 - Baseline: 44fa15 * Julia command flags: - Target: None - Baseline: None * Environment variables: - Target: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` - Baseline: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results A ratio greater than `1.0` denotes a possible regression (marked with :x:), while a ratio less than `1.0` denotes a possible improvement (marked with :white_check_mark:). Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds). | ID | time ratio | memory ratio | |--------------------------------------------------------------------|------------------------------|------------------------------| | `["findfirst", "0%", "tx"]` | 1.10 (5%) :x: | 1.00 (1%) | | `["findfirst", "0%", "tx-noterm"]` | 1.06 (5%) :x: | 1.00 (1%) | | `["findfirst", "20%", "base"]` | 1.07 (5%) :x: | 1.00 (1%) | | `["findfirst", "20%", "tx"]` | 1.06 (5%) :x: | 1.00 (1%) | | `["findfirst", "20%", "tx-noterm"]` | 0.90 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "30%", "tx-noterm"]` | 0.82 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "40%", "tx"]` | 0.94 (5%) :white_check_mark: | 1.00 (1%) | | `["findfirst", "50%", "tx-noterm"]` | 0.96 (5%) | 0.81 (1%) :white_check_mark: | | `["foreach", "broadcast", "A .= B .+ B'"]` | 1.07 (5%) :x: | 1.00 (1%) | | `["foreach", "tx", "A .= B .+ C"]` | 0.95 (5%) :white_check_mark: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.36 (5%) :x: | 1.00 (1%) | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.08 (5%) :x: | 1.00 (1%) | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 1.11 (5%) :x: | 1.00 (1%) | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ### Target ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2397 MHz 58121 s 0 s 2680 s 26439 s 0 s #2 2397 MHz 52695 s 0 s 2895 s 32213 s 0 s Memory: 6.764884948730469 GB (2058.5546875 MB free) Uptime: 896.0 sec Load Avg: 1.32470703125 1.39794921875 0.95947265625 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` ### Baseline ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2397 MHz 84808 s 0 s 3300 s 34434 s 0 s #2 2397 MHz 72929 s 0 s 3652 s 46572 s 0 s Memory: 6.764884948730469 GB (2382.38671875 MB free) Uptime: 1251.0 sec Load Avg: 1.3408203125 1.42822265625 1.111328125 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` --- # Target result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 10:15 * Package commit: 182a51 * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 3.200 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 25.501 μs (5%) | | 11.95 KiB (1%) | 218 | | `["findfirst", "0%", "tx-noterm"]` | 22.501 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 245.631 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 68.502 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 78.102 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 194.505 μs (5%) | | 32.91 KiB (1%) | 603 | | `["findfirst", "10%", "tx-seq"]` | 68.602 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 145.503 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 142.304 μs (5%) | | 21.34 KiB (1%) | 394 | | `["findfirst", "20%", "tx-noterm"]` | 181.705 μs (5%) | | 28.25 KiB (1%) | 520 | | `["findfirst", "20%", "tx-seq"]` | 136.303 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 203.906 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 183.705 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 187.906 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "30%", "tx-seq"]` | 203.906 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 271.607 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 251.307 μs (5%) | | 35.31 KiB (1%) | 651 | | `["findfirst", "40%", "tx-noterm"]` | 254.506 μs (5%) | | 35.30 KiB (1%) | 649 | | `["findfirst", "40%", "tx-seq"]` | 271.907 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 339.409 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 288.608 μs (5%) | | 37.69 KiB (1%) | 697 | | `["findfirst", "50%", "tx-noterm"]` | 311.509 μs (5%) | | 40.05 KiB (1%) | 743 | | `["findfirst", "50%", "tx-seq"]` | 339.509 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 402.915 ms (5%) | 27.943 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 236.311 ms (5%) | 40.127 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 19.650 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 7.788 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 9.669 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 6.316 ms (5%) | | 12.75 KiB (1%) | 124 | | `["foreach_seq", "base", "Matrix"]` | 650.611 μs (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 2.207 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 650.012 μs (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 655.112 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 1.012 ms (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 650.111 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 23.100 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 23.100 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 23.000 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 23.101 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 100.849 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 99.895 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 97.121 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 100.849 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 2.044 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 2.044 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 2.987 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 2.987 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 2.397 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 2.624 ms (5%) | | 1.19 MiB (1%) | 534 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 1.578 ms (5%) | | 965.11 KiB (1%) | 1226 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 1.589 ms (5%) | | 1.02 MiB (1%) | 1245 | | `["sort", "F64 (wide)", "Base"]` | 5.690 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 4.878 ms (5%) | | 1.19 MiB (1%) | 562 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 5.244 ms (5%) | | 1.01 MiB (1%) | 2144 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 5.967 ms (5%) | | 1.39 MiB (1%) | 2195 | | `["sort", "I64 (narrow)", "Base"]` | 143.803 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 147.303 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 147.603 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 147.203 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 5.706 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 4.199 ms (5%) | | 1.19 MiB (1%) | 555 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 3.933 ms (5%) | | 1.01 MiB (1%) | 2236 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 4.718 ms (5%) | | 1.40 MiB (1%) | 2270 | | `["sort", "reversed", "Base"]` | 761.515 μs (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.238 ms (5%) | | 1.18 MiB (1%) | 435 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 1.107 ms (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.535 ms (5%) | | 1.36 MiB (1%) | 1904 | | `["sort", "sorted", "Base"]` | 714.813 μs (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 884.016 μs (5%) | | 1.18 MiB (1%) | 430 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 1.127 ms (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 1.270 ms (5%) | | 1.36 MiB (1%) | 1902 | | `["unique", "rand(1:10, 1000000)", "base"]` | 9.194 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 4.823 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 8.475 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 5.289 ms (5%) | | 1.07 MiB (1%) | 1186 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2397 MHz 58121 s 0 s 2680 s 26439 s 0 s #2 2397 MHz 52695 s 0 s 2895 s 32213 s 0 s Memory: 6.764884948730469 GB (2058.5546875 MB free) Uptime: 896.0 sec Load Avg: 1.32470703125 1.39794921875 0.95947265625 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` --- # Baseline result # Benchmark Report for */home/runner/work/ThreadsX.jl/ThreadsX.jl* ## Job Properties * Time of benchmark: 28 Jun 2020 - 10:21 * Package commit: ab83e0 * Julia commit: 44fa15 * Julia command flags: None * Environment variables: `OMP_NUM_THREADS => 1` `JULIA_NUM_THREADS => 2` ## Results Below is a table of this job's results, obtained by running the benchmarks. The values listed in the `ID` column have the structure `[parent_group, child_group, ..., key]`, and can be used to index into the BaseBenchmarks suite to retrieve the corresponding benchmarks. The percentages accompanying time and memory values in the below table are noise tolerances. The "true" time/memory value for a given benchmark is expected to fall within this percentage of the reported value. An empty cell means that the value was zero. | ID | time | GC time | memory | allocations | |--------------------------------------------------------------------|----------------:|----------:|----------------:|------------:| | `["findfirst", "0%", "base"]` | 3.200 ns (5%) | | | | | `["findfirst", "0%", "tx"]` | 23.102 μs (5%) | | 11.95 KiB (1%) | 218 | | `["findfirst", "0%", "tx-noterm"]` | 21.201 μs (5%) | | 11.97 KiB (1%) | 218 | | `["findfirst", "0%", "tx-seq"]` | 235.037 ns (5%) | | 544 bytes (1%) | 14 | | `["findfirst", "10%", "base"]` | 68.304 μs (5%) | | | | | `["findfirst", "10%", "tx"]` | 76.705 μs (5%) | | 14.36 KiB (1%) | 266 | | `["findfirst", "10%", "tx-noterm"]` | 202.911 μs (5%) | | 32.88 KiB (1%) | 601 | | `["findfirst", "10%", "tx-seq"]` | 68.503 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "20%", "base"]` | 135.707 μs (5%) | | | | | `["findfirst", "20%", "tx"]` | 134.507 μs (5%) | | 21.33 KiB (1%) | 393 | | `["findfirst", "20%", "tx-noterm"]` | 201.611 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "20%", "tx-seq"]` | 136.307 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "30%", "base"]` | 203.513 μs (5%) | | | | | `["findfirst", "30%", "tx"]` | 183.711 μs (5%) | | 28.27 KiB (1%) | 520 | | `["findfirst", "30%", "tx-noterm"]` | 228.014 μs (5%) | | 28.30 KiB (1%) | 521 | | `["findfirst", "30%", "tx-seq"]` | 203.913 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "40%", "base"]` | 271.616 μs (5%) | | | | | `["findfirst", "40%", "tx"]` | 267.416 μs (5%) | | 35.28 KiB (1%) | 649 | | `["findfirst", "40%", "tx-noterm"]` | 264.915 μs (5%) | | 35.30 KiB (1%) | 649 | | `["findfirst", "40%", "tx-seq"]` | 271.716 μs (5%) | | 560 bytes (1%) | 15 | | `["findfirst", "50%", "base"]` | 339.221 μs (5%) | | | | | `["findfirst", "50%", "tx"]` | 296.318 μs (5%) | | 37.70 KiB (1%) | 698 | | `["findfirst", "50%", "tx-noterm"]` | 324.019 μs (5%) | | 49.25 KiB (1%) | 908 | | `["findfirst", "50%", "tx-seq"]` | 339.520 μs (5%) | | 560 bytes (1%) | 15 | | `["foreach", "base", "A .= B .+ B'"]` | 416.810 ms (5%) | 38.899 ms | 305.18 MiB (1%) | 16000002 | | `["foreach", "base", "A .= B .+ C"]` | 246.399 ms (5%) | 36.089 ms | 305.18 MiB (1%) | 16000001 | | `["foreach", "broadcast", "A .= B .+ B'"]` | 18.448 ms (5%) | | | | | `["foreach", "broadcast", "A .= B .+ C"]` | 7.792 ms (5%) | | | | | `["foreach", "tx", "A .= B .+ B'"]` | 9.849 ms (5%) | | 25.94 KiB (1%) | 360 | | `["foreach", "tx", "A .= B .+ C"]` | 6.659 ms (5%) | | 12.75 KiB (1%) | 124 | | `["foreach_seq", "base", "Matrix"]` | 650.423 μs (5%) | | | | | `["foreach_seq", "base", "Transpose"]` | 2.150 ms (5%) | | | | | `["foreach_seq", "base", "Vector"]` | 650.324 μs (5%) | | | | | `["foreach_seq", "tx", "Matrix"]` | 654.623 μs (5%) | | | | | `["foreach_seq", "tx", "Transpose"]` | 964.135 μs (5%) | | 16 bytes (1%) | 1 | | `["foreach_seq", "tx", "Vector"]` | 650.423 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "man"]` | 23.101 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"]` | 23.100 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => false"]` | 22.701 μs (5%) | | | | | `["foreach_seq_double", "cartesian", "tx", ":simd => true"]` | 23.101 μs (5%) | | | | | `["foreach_seq_double", "linear", "man"]` | 96.928 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => :ivdep"]` | 100.000 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => false"]` | 100.000 ns (5%) | | | | | `["foreach_seq_double", "linear", "tx", ":simd => true"]` | 100.000 ns (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "man"]` | 1.500 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"]` | 1.900 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"]` | 2.900 μs (5%) | | | | | `["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"]` | 2.900 μs (5%) | | | | | `["sort", "F64 (narrow)", "Base"]` | 2.399 ms (5%) | | | | | `["sort", "F64 (narrow)", "ThreadsX.MergeSort"]` | 2.597 ms (5%) | | 1.19 MiB (1%) | 533 | | `["sort", "F64 (narrow)", "ThreadsX.QuickSort"]` | 1.566 ms (5%) | | 965.08 KiB (1%) | 1224 | | `["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"]` | 1.594 ms (5%) | | 1.02 MiB (1%) | 1245 | | `["sort", "F64 (wide)", "Base"]` | 5.676 ms (5%) | | | | | `["sort", "F64 (wide)", "ThreadsX.MergeSort"]` | 4.796 ms (5%) | | 1.19 MiB (1%) | 563 | | `["sort", "F64 (wide)", "ThreadsX.QuickSort"]` | 4.713 ms (5%) | | 1.01 MiB (1%) | 2144 | | `["sort", "F64 (wide)", "ThreadsX.StableQuickSort"]` | 5.775 ms (5%) | | 1.39 MiB (1%) | 2193 | | `["sort", "I64 (narrow)", "Base"]` | 143.506 μs (5%) | | 160 bytes (1%) | 1 | | `["sort", "I64 (narrow)", "ThreadsX.MergeSort"]` | 145.106 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.QuickSort"]` | 145.106 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"]` | 145.106 μs (5%) | | 864 bytes (1%) | 17 | | `["sort", "I64 (wide)", "Base"]` | 5.707 ms (5%) | | | | | `["sort", "I64 (wide)", "ThreadsX.MergeSort"]` | 4.093 ms (5%) | | 1.19 MiB (1%) | 554 | | `["sort", "I64 (wide)", "ThreadsX.QuickSort"]` | 3.997 ms (5%) | | 1.01 MiB (1%) | 2237 | | `["sort", "I64 (wide)", "ThreadsX.StableQuickSort"]` | 4.625 ms (5%) | | 1.40 MiB (1%) | 2270 | | `["sort", "reversed", "Base"]` | 760.631 μs (5%) | | | | | `["sort", "reversed", "ThreadsX.MergeSort"]` | 1.206 ms (5%) | | 1.18 MiB (1%) | 435 | | `["sort", "reversed", "ThreadsX.QuickSort"]` | 1.158 ms (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "reversed", "ThreadsX.StableQuickSort"]` | 1.563 ms (5%) | | 1.36 MiB (1%) | 1903 | | `["sort", "sorted", "Base"]` | 714.128 μs (5%) | | | | | `["sort", "sorted", "ThreadsX.MergeSort"]` | 883.234 μs (5%) | | 1.18 MiB (1%) | 431 | | `["sort", "sorted", "ThreadsX.QuickSort"]` | 1.140 ms (5%) | | 998.75 KiB (1%) | 1871 | | `["sort", "sorted", "ThreadsX.StableQuickSort"]` | 1.278 ms (5%) | | 1.36 MiB (1%) | 1904 | | `["unique", "rand(1:10, 1000000)", "base"]` | 9.354 ms (5%) | | 832 bytes (1%) | 8 | | `["unique", "rand(1:10, 1000000)", "tx"]` | 4.816 ms (5%) | | 50.98 KiB (1%) | 882 | | `["unique", "rand(1:1000, 1000000)", "base"]` | 8.667 ms (5%) | | 65.95 KiB (1%) | 27 | | `["unique", "rand(1:1000, 1000000)", "tx"]` | 5.140 ms (5%) | | 1.07 MiB (1%) | 1186 | ## Benchmark Group List Here's a list of all the benchmark groups executed by this job: - `["findfirst", "0%"]` - `["findfirst", "10%"]` - `["findfirst", "20%"]` - `["findfirst", "30%"]` - `["findfirst", "40%"]` - `["findfirst", "50%"]` - `["foreach", "base"]` - `["foreach", "broadcast"]` - `["foreach", "tx"]` - `["foreach_seq", "base"]` - `["foreach_seq", "tx"]` - `["foreach_seq_double", "cartesian"]` - `["foreach_seq_double", "cartesian", "tx"]` - `["foreach_seq_double", "linear"]` - `["foreach_seq_double", "linear", "tx"]` - `["foreach_seq_sum_many", ":nvecs => 8"]` - `["foreach_seq_sum_many", ":nvecs => 8", "tx"]` - `["sort", "F64 (narrow)"]` - `["sort", "F64 (wide)"]` - `["sort", "I64 (narrow)"]` - `["sort", "I64 (wide)"]` - `["sort", "reversed"]` - `["sort", "sorted"]` - `["unique", "rand(1:10, 1000000)"]` - `["unique", "rand(1:1000, 1000000)"]` ## Julia versioninfo ``` Julia Version 1.4.2 Commit 44fa15b150* (2020-05-23 18:35 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) Ubuntu 18.04.4 LTS uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: speed user nice sys idle irq #1 2397 MHz 84808 s 0 s 3300 s 34434 s 0 s #2 2397 MHz 72929 s 0 s 3652 s 46572 s 0 s Memory: 6.764884948730469 GB (2382.38671875 MB free) Uptime: 1251.0 sec Load Avg: 1.3408203125 1.42822265625 1.111328125 WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, haswell) ``` --- # Runtime information | Runtime Info | | |:--|:--| | BLAS #threads | 2 | | `BLAS.vendor()` | `openblas64` | | `Sys.CPU_THREADS` | 2 | `lscpu` output: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Stepping: 2 CPU MHz: 2397.221 BogoMIPS: 4794.44 Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear | Cpu Property | Value | |:------------------ |:------------------------------------------------------- | | Brand | Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz | | Vendor | :Intel | | Architecture | :Haswell | | Model | Family: 0x06, Model: 0x3f, Stepping: 0x02, Type: 0x00 | | Cores | 2 physical cores, 2 logical cores (on executing CPU) | | | No Hyperthreading detected | | Clock Frequencies | Not supported by CPU | | Data Cache | Level 1:3 : (32, 256, 30720) kbytes | | | 64 byte cache line size | | Address Size | 48 bits virtual, 44 bits physical | | SIMD | 256 bit = 32 byte max. SIMD vector size | | Time Stamp Counter | TSC is accessible via `rdtsc` | | | TSC increased at every clock cycle (non-invariant TSC) | | Perf. Monitoring | Performance Monitoring Counters (PMC) are not supported | | Hypervisor | Yes, Microsoft |