Open kgryte opened 4 years ago
I'm adding some benchmarks here to help provide additional context and implications of the current work.
Here are the benchmarks for the strided array interface, as currently implemented:
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/abs/benchmark/benchmark.ndarray.native.js
TAP version 13
# @stdlib/strided/math/special/abs::native:ndarray:len=10
---
iterations: 1000000
elapsed: 0.542590328
rate: 1843011.1050560416
...
ok 1 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=10
---
iterations: 1000000
elapsed: 0.546170667
rate: 1830929.5251844784
...
ok 2 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=10
---
iterations: 1000000
elapsed: 0.54911345
rate: 1821117.29370315
...
ok 3 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=100
---
iterations: 1000000
elapsed: 0.825757
rate: 1211010.0186858845
...
ok 4 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=100
---
iterations: 1000000
elapsed: 0.820846319
rate: 1218254.838759897
...
ok 5 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=100
---
iterations: 1000000
elapsed: 0.822068907
rate: 1216443.0396100604
...
ok 6 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=1000
---
iterations: 100000
elapsed: 0.346216651
rate: 288836.4834884848
...
ok 7 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=1000
---
iterations: 100000
elapsed: 0.339341522
rate: 294688.37002505106
...
ok 8 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=1000
---
iterations: 100000
elapsed: 0.343990798
rate: 290705.4507894133
...
ok 9 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=10000
---
iterations: 10000
elapsed: 0.30209105
rate: 33102.602675584065
...
ok 10 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=10000
---
iterations: 10000
elapsed: 0.301467197
rate: 33171.104848266456
...
ok 11 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=10000
---
iterations: 10000
elapsed: 0.302045479
rate: 33107.597018527136
...
ok 12 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=100000
---
iterations: 1000
elapsed: 0.293969017
rate: 3401.7190321795033
...
ok 13 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=100000
---
iterations: 1000
elapsed: 0.287274552
rate: 3480.9905473283966
...
ok 14 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=100000
---
iterations: 1000
elapsed: 0.299388098
rate: 3340.1461403452317
...
ok 15 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=1000000
---
iterations: 100
elapsed: 0.306394113
rate: 326.377028007715
...
ok 16 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=1000000
---
iterations: 100
elapsed: 0.307819325
rate: 324.86589332882204
...
ok 17 benchmark finished
# @stdlib/strided/math/special/abs::native:ndarray:len=1000000
---
iterations: 100
elapsed: 0.303846911
rate: 329.1131039341058
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/abs/benchmark/benchmark.native.js
TAP version 13
# @stdlib/strided/math/special/abs::native:len=10
---
iterations: 1000000
elapsed: 0.405647482
rate: 2465194.644053035
...
ok 1 benchmark finished
# @stdlib/strided/math/special/abs::native:len=10
---
iterations: 1000000
elapsed: 0.412639994
rate: 2423419.9654432912
...
ok 2 benchmark finished
# @stdlib/strided/math/special/abs::native:len=10
---
iterations: 1000000
elapsed: 0.408355417
rate: 2448847.1521855677
...
ok 3 benchmark finished
# @stdlib/strided/math/special/abs::native:len=100
---
iterations: 1000000
elapsed: 0.691814215
rate: 1445474.7796704352
...
ok 4 benchmark finished
# @stdlib/strided/math/special/abs::native:len=100
---
iterations: 1000000
elapsed: 0.679503589
rate: 1471662.5727785523
...
ok 5 benchmark finished
# @stdlib/strided/math/special/abs::native:len=100
---
iterations: 1000000
elapsed: 0.733919673
rate: 1362546.9336614988
...
ok 6 benchmark finished
# @stdlib/strided/math/special/abs::native:len=1000
---
iterations: 100000
elapsed: 0.36152586
rate: 276605.3858498532
...
ok 7 benchmark finished
# @stdlib/strided/math/special/abs::native:len=1000
---
iterations: 100000
elapsed: 0.326910484
rate: 305894.1358393388
...
ok 8 benchmark finished
# @stdlib/strided/math/special/abs::native:len=1000
---
iterations: 100000
elapsed: 0.313741831
rate: 318733.3983526092
...
ok 9 benchmark finished
# @stdlib/strided/math/special/abs::native:len=10000
---
iterations: 10000
elapsed: 0.291659503
rate: 34286.55640272417
...
ok 10 benchmark finished
# @stdlib/strided/math/special/abs::native:len=10000
---
iterations: 10000
elapsed: 0.302755194
rate: 33029.98659702598
...
ok 11 benchmark finished
# @stdlib/strided/math/special/abs::native:len=10000
---
iterations: 10000
elapsed: 0.298205107
rate: 33533.96627107395
...
ok 12 benchmark finished
# @stdlib/strided/math/special/abs::native:len=100000
---
iterations: 1000
elapsed: 0.27920722
rate: 3581.5692731728072
...
ok 13 benchmark finished
# @stdlib/strided/math/special/abs::native:len=100000
---
iterations: 1000
elapsed: 0.288315077
rate: 3468.427702100366
...
ok 14 benchmark finished
# @stdlib/strided/math/special/abs::native:len=100000
---
iterations: 1000
elapsed: 0.297698562
rate: 3359.102554213883
...
ok 15 benchmark finished
# @stdlib/strided/math/special/abs::native:len=1000000
---
iterations: 100
elapsed: 0.304977388
rate: 327.8931617054835
...
ok 16 benchmark finished
# @stdlib/strided/math/special/abs::native:len=1000000
---
iterations: 100
elapsed: 0.298957976
rate: 334.4951733283075
...
ok 17 benchmark finished
# @stdlib/strided/math/special/abs::native:len=1000000
---
iterations: 100
elapsed: 0.304128663
rate: 328.8082057559961
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/abs/benchmark/benchmark.js
TAP version 13
# @stdlib/strided/math/special/abs:len=10
---
iterations: 1000000
elapsed: 0.338511632
rate: 2954108.2357843467
...
ok 1 benchmark finished
# @stdlib/strided/math/special/abs:len=10
---
iterations: 1000000
elapsed: 0.327715766
rate: 3051424.7520212377
...
ok 2 benchmark finished
# @stdlib/strided/math/special/abs:len=10
---
iterations: 1000000
elapsed: 0.330496007
rate: 3025755.1644186736
...
ok 3 benchmark finished
# @stdlib/strided/math/special/abs:len=100
---
iterations: 1000000
elapsed: 0.691415536
rate: 1446308.2588297522
...
ok 4 benchmark finished
# @stdlib/strided/math/special/abs:len=100
---
iterations: 1000000
elapsed: 0.678599439
rate: 1473623.381524782
...
ok 5 benchmark finished
# @stdlib/strided/math/special/abs:len=100
---
iterations: 1000000
elapsed: 0.699995137
rate: 1428581.3531301718
...
ok 6 benchmark finished
# @stdlib/strided/math/special/abs:len=1000
---
iterations: 100000
elapsed: 0.445000866
rate: 224718.66380592616
...
ok 7 benchmark finished
# @stdlib/strided/math/special/abs:len=1000
---
iterations: 100000
elapsed: 0.443071861
rate: 225697.02299374863
...
ok 8 benchmark finished
# @stdlib/strided/math/special/abs:len=1000
---
iterations: 100000
elapsed: 0.444303802
rate: 225071.22277562684
...
ok 9 benchmark finished
# @stdlib/strided/math/special/abs:len=10000
---
iterations: 10000
elapsed: 0.772345405
rate: 12947.574926013835
...
ok 10 benchmark finished
# @stdlib/strided/math/special/abs:len=10000
---
iterations: 10000
elapsed: 0.77681634
rate: 12873.05568263407
...
ok 11 benchmark finished
# @stdlib/strided/math/special/abs:len=10000
---
iterations: 10000
elapsed: 0.77653087
rate: 12877.788103903713
...
ok 12 benchmark finished
# @stdlib/strided/math/special/abs:len=100000
---
iterations: 1000
elapsed: 0.83744988
rate: 1194.1013114719176
...
ok 13 benchmark finished
# @stdlib/strided/math/special/abs:len=100000
---
iterations: 1000
elapsed: 0.851412887
rate: 1174.5182804591493
...
ok 14 benchmark finished
# @stdlib/strided/math/special/abs:len=100000
---
iterations: 1000
elapsed: 0.839253739
rate: 1191.5347570468198
...
ok 15 benchmark finished
# @stdlib/strided/math/special/abs:len=1000000
---
iterations: 100
elapsed: 0.870049886
rate: 114.93593828250901
...
ok 16 benchmark finished
# @stdlib/strided/math/special/abs:len=1000000
---
iterations: 100
elapsed: 0.857157195
rate: 116.66471515764387
...
ok 17 benchmark finished
# @stdlib/strided/math/special/abs:len=1000000
---
iterations: 100
elapsed: 0.851565792
rate: 117.4307386927069
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/abs/benchmark/benchmark.ndarray.js
TAP version 13
# @stdlib/strided/math/special/abs:ndarray:len=10
---
iterations: 1000000
elapsed: 0.393669522
rate: 2540201.7278848425
...
ok 1 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=10
---
iterations: 1000000
elapsed: 0.38124097
rate: 2623012.9463787694
...
ok 2 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=10
---
iterations: 1000000
elapsed: 0.374995224
rate: 2666700.6297658873
...
ok 3 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=100
---
iterations: 1000000
elapsed: 0.659446001
rate: 1516424.3902966667
...
ok 4 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=100
---
iterations: 1000000
elapsed: 0.660899761
rate: 1513088.760218208
...
ok 5 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=100
---
iterations: 1000000
elapsed: 0.656215304
rate: 1523890.0920238213
...
ok 6 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=1000
---
iterations: 100000
elapsed: 0.380436619
rate: 262855.87402930844
...
ok 7 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=1000
---
iterations: 100000
elapsed: 0.36811588
rate: 271653.5890817859
...
ok 8 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=1000
---
iterations: 100000
elapsed: 0.372928833
rate: 268147.6763154969
...
ok 9 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=10000
---
iterations: 10000
elapsed: 0.753502108
rate: 13271.363004600911
...
ok 10 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=10000
---
iterations: 10000
elapsed: 0.736914559
rate: 13570.094223094322
...
ok 11 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=10000
---
iterations: 10000
elapsed: 0.740829738
rate: 13498.37821980089
...
ok 12 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=100000
---
iterations: 1000
elapsed: 0.792041357
rate: 1262.5603337023726
...
ok 13 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=100000
---
iterations: 1000
elapsed: 0.803587885
rate: 1244.418959850297
...
ok 14 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=100000
---
iterations: 1000
elapsed: 0.806175121
rate: 1240.425279757548
...
ok 15 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=1000000
---
iterations: 100
elapsed: 0.825726727
rate: 121.10544170383865
...
ok 16 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=1000000
---
iterations: 100
elapsed: 0.822104266
rate: 121.63907199576556
...
ok 17 benchmark finished
# @stdlib/strided/math/special/abs:ndarray:len=1000000
---
iterations: 100
elapsed: 0.822530554
rate: 121.57603084006531
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
and here are the benchmarks for dabs
, a strided array interface only supporting double-precision floating-point numbers.
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/dabs/benchmark/benchmark.ndarray.native.js
TAP version 13
# @stdlib/strided/math/special/dabs::native:ndarray:len=10
---
iterations: 1000000
elapsed: 0.435212237
rate: 2297729.5098437225
...
ok 1 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=10
---
iterations: 1000000
elapsed: 0.437873682
rate: 2283763.6540119806
...
ok 2 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=10
---
iterations: 1000000
elapsed: 0.441242788
rate: 2266325.993751993
...
ok 3 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=100
---
iterations: 1000000
elapsed: 0.688217091
rate: 1453029.8841416016
...
ok 4 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=100
---
iterations: 1000000
elapsed: 0.698768618
rate: 1431088.8815559258
...
ok 5 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=100
---
iterations: 1000000
elapsed: 0.675800943
rate: 1479725.6653132548
...
ok 6 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=1000
---
iterations: 100000
elapsed: 0.311408527
rate: 321121.5857297318
...
ok 7 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=1000
---
iterations: 100000
elapsed: 0.310411062
rate: 322153.4675848633
...
ok 8 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=1000
---
iterations: 100000
elapsed: 0.308695609
rate: 323943.7072783242
...
ok 9 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=10000
---
iterations: 10000
elapsed: 0.269504939
rate: 37105.07138423908
...
ok 10 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=10000
---
iterations: 10000
elapsed: 0.278106815
rate: 35957.40722858589
...
ok 11 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=10000
---
iterations: 10000
elapsed: 0.268065463
rate: 37304.32069870933
...
ok 12 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=100000
---
iterations: 1000
elapsed: 0.265874271
rate: 3761.17627417961
...
ok 13 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=100000
---
iterations: 1000
elapsed: 0.264879564
rate: 3775.300687221004
...
ok 14 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=100000
---
iterations: 1000
elapsed: 0.262809734
rate: 3805.0341012102695
...
ok 15 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=1000000
---
iterations: 100
elapsed: 0.282986746
rate: 353.3734403235973
...
ok 16 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=1000000
---
iterations: 100
elapsed: 0.278570983
rate: 358.9749331501623
...
ok 17 benchmark finished
# @stdlib/strided/math/special/dabs::native:ndarray:len=1000000
---
iterations: 100
elapsed: 0.284860537
rate: 351.0489766436128
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/dabs/benchmark/benchmark.native.js
TAP version 13
# @stdlib/strided/math/special/dabs::native:len=10
---
iterations: 1000000
elapsed: 0.303859275
rate: 3290997.1235862393
...
ok 1 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=10
---
iterations: 1000000
elapsed: 0.306635692
rate: 3261198.960491527
...
ok 2 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=10
---
iterations: 1000000
elapsed: 0.303345013
rate: 3296576.364022836
...
ok 3 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=100
---
iterations: 1000000
elapsed: 0.551985124
rate: 1811643.0253652995
...
ok 4 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=100
---
iterations: 1000000
elapsed: 0.55578898
rate: 1799244.0224345578
...
ok 5 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=100
---
iterations: 1000000
elapsed: 0.61533631
rate: 1625127.5664197356
...
ok 6 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=1000
---
iterations: 100000
elapsed: 0.288136087
rate: 347058.2287736836
...
ok 7 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=1000
---
iterations: 100000
elapsed: 0.293719636
rate: 340460.7242533829
...
ok 8 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=1000
---
iterations: 100000
elapsed: 0.310028647
rate: 322550.8383423678
...
ok 9 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=10000
---
iterations: 10000
elapsed: 0.291104736
rate: 34351.8973184964
...
ok 10 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=10000
---
iterations: 10000
elapsed: 0.270915162
rate: 36911.92447914746
...
ok 11 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=10000
---
iterations: 10000
elapsed: 0.26407803
rate: 37867.59542245903
...
ok 12 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=100000
---
iterations: 1000
elapsed: 0.26354886
rate: 3794.3628365533436
...
ok 13 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=100000
---
iterations: 1000
elapsed: 0.260865537
rate: 3833.392526664034
...
ok 14 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=100000
---
iterations: 1000
elapsed: 0.265353175
rate: 3768.562407440574
...
ok 15 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=1000000
---
iterations: 100
elapsed: 0.279323951
rate: 358.0072515872439
...
ok 16 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=1000000
---
iterations: 100
elapsed: 0.289404916
rate: 345.5366321420746
...
ok 17 benchmark finished
# @stdlib/strided/math/special/dabs::native:len=1000000
---
iterations: 100
elapsed: 0.281906876
rate: 354.72706951638884
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/dabs/benchmark/benchmark.js
TAP version 13
# @stdlib/strided/math/special/dabs:len=10
---
iterations: 10000000
elapsed: 0.302783518
rate: 33026896.794296447
...
ok 1 benchmark finished
# @stdlib/strided/math/special/dabs:len=10
---
iterations: 10000000
elapsed: 0.304417186
rate: 32849656.52366289
...
ok 2 benchmark finished
# @stdlib/strided/math/special/dabs:len=10
---
iterations: 10000000
elapsed: 0.297130217
rate: 33655277.813767426
...
ok 3 benchmark finished
# @stdlib/strided/math/special/dabs:len=100
---
iterations: 1000000
elapsed: 0.189698545
rate: 5271521.718840806
...
ok 4 benchmark finished
# @stdlib/strided/math/special/dabs:len=100
---
iterations: 1000000
elapsed: 0.190088704
rate: 5260701.866850542
...
ok 5 benchmark finished
# @stdlib/strided/math/special/dabs:len=100
---
iterations: 1000000
elapsed: 0.193113801
rate: 5178293.80821933
...
ok 6 benchmark finished
# @stdlib/strided/math/special/dabs:len=1000
---
iterations: 100000
elapsed: 0.191335675
rate: 522641.6871814417
...
ok 7 benchmark finished
# @stdlib/strided/math/special/dabs:len=1000
---
iterations: 100000
elapsed: 0.176457849
rate: 566707.5767199225
...
ok 8 benchmark finished
# @stdlib/strided/math/special/dabs:len=1000
---
iterations: 100000
elapsed: 0.174965862
rate: 571540.064198352
...
ok 9 benchmark finished
# @stdlib/strided/math/special/dabs:len=10000
---
iterations: 10000
elapsed: 0.489760559
rate: 20418.1406939304
...
ok 10 benchmark finished
# @stdlib/strided/math/special/dabs:len=10000
---
iterations: 10000
elapsed: 0.503766985
rate: 19850.44732536413
...
ok 11 benchmark finished
# @stdlib/strided/math/special/dabs:len=10000
---
iterations: 10000
elapsed: 0.580820838
rate: 17217.01314028957
...
ok 12 benchmark finished
# @stdlib/strided/math/special/dabs:len=100000
---
iterations: 1000
elapsed: 0.677884666
rate: 1475.1771948179753
...
ok 13 benchmark finished
# @stdlib/strided/math/special/dabs:len=100000
---
iterations: 1000
elapsed: 0.628820705
rate: 1590.2784244357858
...
ok 14 benchmark finished
# @stdlib/strided/math/special/dabs:len=100000
---
iterations: 1000
elapsed: 0.649244846
rate: 1540.2509641177805
...
ok 15 benchmark finished
# @stdlib/strided/math/special/dabs:len=1000000
---
iterations: 100
elapsed: 0.650647435
rate: 153.69306727536704
...
ok 16 benchmark finished
# @stdlib/strided/math/special/dabs:len=1000000
---
iterations: 100
elapsed: 0.641909981
rate: 155.78508351625086
...
ok 17 benchmark finished
# @stdlib/strided/math/special/dabs:len=1000000
---
iterations: 100
elapsed: 0.676401649
rate: 147.8411534741838
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
Running benchmark: /Users/kgryte/Coding/projects/personal/stdlib/stdlib/lib/node_modules/@stdlib/strided/math/special/dabs/benchmark/benchmark.ndarray.js
TAP version 13
# @stdlib/strided/math/special/dabs:ndarray:len=10
---
iterations: 10000000
elapsed: 0.366156019
rate: 27310762.30102884
...
ok 1 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=10
---
iterations: 10000000
elapsed: 0.363239379
rate: 27530054.77415487
...
ok 2 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=10
---
iterations: 10000000
elapsed: 0.364019379
rate: 27471064.940199245
...
ok 3 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=100
---
iterations: 1000000
elapsed: 0.256248388
rate: 3902463.573741584
...
ok 4 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=100
---
iterations: 1000000
elapsed: 0.248516694
rate: 4023874.549047397
...
ok 5 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=100
---
iterations: 1000000
elapsed: 0.254779693
rate: 3924959.592442872
...
ok 6 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=1000
---
iterations: 100000
elapsed: 0.254221689
rate: 393357.46841017966
...
ok 7 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=1000
---
iterations: 100000
elapsed: 0.248581277
rate: 402282.91207949666
...
ok 8 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=1000
---
iterations: 100000
elapsed: 0.245398908
rate: 407499.7758343733
...
ok 9 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=10000
---
iterations: 10000
elapsed: 0.450701244
rate: 22187.646768509916
...
ok 10 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=10000
---
iterations: 10000
elapsed: 0.433642286
rate: 23060.481698502994
...
ok 11 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=10000
---
iterations: 10000
elapsed: 0.451250753
rate: 22160.62784054789
...
ok 12 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=100000
---
iterations: 1000
elapsed: 0.689415555
rate: 1450.5039707147312
...
ok 13 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=100000
---
iterations: 1000
elapsed: 0.75929212
rate: 1317.0161702718578
...
ok 14 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=100000
---
iterations: 1000
elapsed: 0.749427405
rate: 1334.3520577553472
...
ok 15 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=1000000
---
iterations: 100
elapsed: 0.7488877
rate: 133.5313692560313
...
ok 16 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=1000000
---
iterations: 100
elapsed: 0.735611737
rate: 135.9412784899597
...
ok 17 benchmark finished
# @stdlib/strided/math/special/dabs:ndarray:len=1000000
---
iterations: 100
elapsed: 0.729081344
rate: 137.1589066473219
...
ok 18 benchmark finished
#
1..18
# total 18
# pass 18
#
# ok
A couple of observations:
N <= 100
), as the relative overhead of calling into C is far greater.The first thing that strikes me is that data.js
seems to be the central feature of this implementation, but it's not readily apparent to me what it actually means (the grouping, in particular), despite having worked through it just a couple days ago.
It would be nice if the meaning could be communicated somehow so that newcomers can understand the motivation, maintenance, and extension more easily.
Ah, I see now that I failed to view it together with https://github.com/stdlib-js/stdlib/blob/7fb13b1cdcb55a60bfa1a343932793439c716a85/lib/node_modules/%40stdlib/strided/math/special/abs/lib/types.json, which seems to contain the information necessary to decode it.
@rreusser data.js
mirrors the C API, which can be found here. In short, "data" refers to whatever data should be provided to an "apply" function during invocation. Here, the data is comprised of callbacks. However, the general concept is more broadly applicable to any sort of function data, such as meta data, hashes, additional parameters, callbacks, etc. In C, data
is provided as an opaque pointer.
And yeah, the "grouping" was my way of grouping related functionality (e.g., these callbacks are for the int32
interfaces, these are for the uint16
interfaces, etc).
I should state that the strided array function data (i.e., types.json
, data.js
, abs.functions.js
, etc) do not need to be split over multiple files; they can be included in the same file, as in C. I chose to split into separate files partly for convenience and partly to allow for reuse of such data in other contexts (e.g., data.js
to be reused for both the main interfaces and the ndarray interfaces, types.json
to allow for independent consumption when generating tests and benchmarks, etc).
The second thing of note is that you're right—the number of files does make for a fair bit of maintenance. Correct me if I'm wrong though, but the unit cost of importing one additional strided function interface is quite small, right? All of the js files I see lean on common implementation and, using a bundler with hoisting and whatnot, would compile down to perhaps 1kb on the high side.
Yes, that should be true. The goal would be to abstract away the underlying implementation (i.e., the loops), so bundles should be relatively small. This is unlike in C where we have to rely on macros which then expand into multiple loop implementations. In JS, there would only be one.
The splitting makes sense and I understand the motivation for testing and repurposing. The downside, of course, is that it requires more knowledge, context, and navigation in order to comprehend and work with, e.g. matching up lines across multiple files. If you delete a line from data.json
, you have to track down every other parallel list and ensure they are all synchronized.
Yes, that is correct. abs
is a bit of an extreme case, given the number of accepted input and output types. For sin
, for example, the number of accepted type combinations would be significantly smaller, as the only accepted output types would be float32
and float64
. This is true of other transcendental functions. Part of my motivation for using abs
was to see how imposing the maintenance burden would be for a more "extreme" example.
...but you're right that there is fairly significant room for developer error when writing these interfaces (e.g., matching elements across the various meta data arrays).
Yeah, it's not a strong feeling. I always consider the elm lang thing, "Make impossible states impossible." That is, if inconsistent input (parallel calling signature lists of different lengths) is possible, then maybe it's not a matter of validation but a sign the input format is suboptimal. Following this to the logical end, input would look like (very hand-wavily speaking):
module.exports = [
{ args: [ 'uint8' ], fcn: absl },
{ args: [ 'float32' ], fcn: absf },
...
]
Which probably is too far. Or just
module.exports = [
[ [ 'uint8' ], absl ],
[ [ 'float32' ], absf ],
]
or even just collocating with
module.exports = [
[ 'uint8', 'float32' ],
[ absl, absf ]
]
or some other 'compressed' representation. Whether this is actually any better or more preferable is definitely not something I'm willing to commit to. I'd only offer that it might be possible to structure the input so that it's not just clearer but so that it's possible to make inconsistent input impossible without having to validate anything.
Finally, random example: cpow( complexArray, complexPower )
. I assume but don't fully understand the implementation well enough to see, that there's some allowance for a second argument which could be a constant for which "Number" is not sufficient? Or more likely, would this be a case for broadcasting and the complex power would be implicitly converted into an element-wise operation?
Re: compressed form. I suppose one design goal toward which I biased the implementation was to make the JS implementation mirror the C implementation. In C, we need separate lists due to types. In JS (to state the obvious), we are not bound by such restrictions.
This stated, the versions you provided above would also be reasonable. And for that matter, could easily be supported via utility functions which take one of the above forms and then map to the current form. But this would mean that there'd be more than one way to do the same thing which may or may not be desirable.
Re: broadcasting. This is not explicitly supported in the sense that you cannot simply specify number
. The reasons have to do with how this would work in the C layer where it is not clear how a JS number should be cast. NumPy has logic for trying to determine the minimum type which can hold a scalar constant and then types that value accordingly. From there, the scalar constant is allowed to participate in type promotion.
Here, I've chosen not to allow scalar constants to be provided in the current implementation to avoid the added complexity and ambiguous design decisions. Instead, in order to broadcast, a user simply provides a strided array having a "stride" of 0
. This ensures that, during the loop, the same array element is always accessed during each loop iteration. This also means that a user is responsible for explicitly typing a value to broadcast (e.g., new Int32Array( [ 3 ] )
or new Float64Array( [3.0 ] )
, etc).
Yeah, honestly, I think the way you designed it is the best option. It's straightforward, probably not that common to work with, and can at least be managed with some simple validations. 😄
At the JS layer, I did include validation. See here. But this does not provide the sort of locality which would make identifying bugs easier.
Re: how common. Well, if we decide to go with the current implementation, it will be rolled out to at least 200
packages in stdlib
. 😅
We may want some way for a user to query what types are supported by a given strided API. What would this look like? In NumPy, you can do np.sin.types which returns a list of supported input types.
Introspection is always nice. The design suggests this would be a pretty straightforward addition, no?
One thing to note is that I’ve not included support for automatic casting and/or detection of overlapping buffers. My sense is that these behaviors, if desired, can be supported by wrapping these "lower-level" interfaces and performing the casting and/or copies there.
Re: overlapping buffers, I can imagine strange use cases with interleaved real/complex, for example, where the input and output may legitimately overlap, at least when treated as a large block, though the individual components may never intersect. Is that indeed the case? It seems better to push that behavior elsewhere. Same for casting. Would something else need to receive the operator's type signature and cast accordingly before passing it off to this interface? Does that relate to the introspection in the first part of this question?
Re: introspection. Yes, this would be a straightforward addition. Can hang a method off the dispatch function (see here) quite easily based on the provided types
array. So this would not have to be anything a developer would have to do. And does not affect how things are structured in terms of files, so long as the types
are provided to the dispatch
factory as a list.
Re: overlapping buffers. The more pressing case is, say, sin( 10, x, 1, y, -1 )
, where x
and y
point to the same memory (i.e., we're doing an in-place operation; however, we're doing this by putting the first result in the last element). In this case, once we get to x[5]
, we've already overwritten memory, thus producing unexpected results. In NumPy world, we'd detect this condition and perform a copy of x
before proceeding with computation.
My thinking is that this should be pushed elsewhere, perhaps to the ndarray
abstraction layer, where we're willing to take a greater perf hit for the sake of convenience and where we already have the concern of "views". Here, we assume more ideal conditions.
Re: casting. Agreed. My take is that casting can be something which is added as an abstraction layer above. For the current implementation, we simply assume that, if you don't have acceptable input types, you'll have casted them accordingly before invoking the strided array function. In short, a user is forced to be explicit. Things like casting can also be punted to, e.g., the ndarray
abstraction layer.
Re overlapping: sorry, in particular I was trying to invent a case where trivially "overlapping" use of buffers may not lead to conflicting memory access, so that an incomplete, overzealous check would be problematic. I'm not sure how complex it is to check two strided array patterns for overlapping access. I think there's a chance it might actually be somewhat complicated (lots of LCD/GCD and maybe some lattice sort of thing?) or at least rather expensive to fully and correctly check for intersections.
You are correct! Not straightforward at all. NumPy provides two APIs: may_share_memory
and shares_memory
, with the former just performing bounds checks, and the latter using a much more complicated algo which can be found here.
Concerning the data.js
and types.json
debate, the solution most in line with the C file would be to have them both hanging off of one main export. It sounds like you decided against that due to the need to independently require types.json
. My question is how be much of a problem it would be to require some other unnecessary files in say tests when doing var types = require( '...').types
, as neither performance nor a small bundle size would be the main concern. or would they?
I guess the question is also how often you envision one having to add new type signatures or re-arranging the ones in these files. If not super-rare, then having them colocated might be a good idea I think.
@Planeshifter I think the problem is not so much that the info is split across multiple files, but that the info is split across multiple data structures. Because the data resides in separate lists, the data requires more effort be kept in sync. This stands in contrast to @rreusser's other proposals where function data and types can be found in the same data structure.
Yeah, understood and the arguments put forth made sense to me, but I gathered that preserving a comparable format like in C was outweighing the benefits of having them both in one data structure. I was just inquiring about how big of a benefit is achieved in having them in two separate files. Although I suppose one could also add a one-line comment in data.js
explaining where to find the corresponding types or just amend the existing NOTE
.
Concerning the question of when to call into C becomes worthwhile, you show benchmarks above showing that >100 might be a good cutoff. How much variation do we expect across packages and would it be a viable strategy to set a magic constant for each package as a cutoff for when to start calling into C?
Re: separate files. Yeah, I don't know if having them in the same file actually achieves much, especially if the respective lists are long. It would just create a long file and require developers to scroll up and down to match elements.
I think you are right that, at the very least, we should add/update a comment in data.js
to point to the corresponding types
file.
Re: heuristics. I really don't know. And I don't know how reproducible this is across platforms. If I'd hazard a guess, I'd guess that the heuristic should be more or less the same, but we'd definitely want to run the benchmarks to confirm before hardcoding. I can imagine for more compute intensive functions (say, functions which need to perform internal iteration to converge), the magic constant could be smaller. For things like an identity function, the magic constant could larger. So I don't have a good handle on variability and whether we can just count on copying-and-pasting implementations without confirming the constant's applicability. In a more ideal world, we'd do "burn ins" on install and tailor the heuristic for each implementation to the host platform.
I've updated the OP with some updates.
Re: data.js
. I've added a comment pointing to the types.json
file.
Re: heuristics. I tried adding a heuristic; however, I encountered unexpected results. Namely, if I only benchmarked a single set of types (e.g., float64
input and output arrays), using a heuristic had the intended effect: for shorter arrays, we could match JS performance when invoking the native add-on interface, and, for longer arrays, we could call into C to exceed JS performance. However, once I began providing different typed array data types to the same interface, I found that V8 seemed to de-optimize the interfaces, resulting in worse JS performance for shorter arrays compared to the native interfaces. My hypothesis is that V8 de-optimized due to polymorphism. Accordingly, as the intent of these multiple-dispatch interfaces is to support multiple input array data types, I chose not to include any heuristics. While calling into the native add-ons is not the most performant when the array types are severely constrained, the native add-ons have more consistent performance and don't seem to hit the same perf cliff as the JS implementations do.
At this point, I am reasonably content with the strided array interfaces (e.g., abs). We still have the issue of several files in ./lib
, but not sure we can abstract away any more of the implementation to make one or more files unnecessary. If anyone has any ideas, let me know.
I’ve got a proof-of-concept strided array interface supporting multiple dispatch (i.e., an interface which delegates to a type-aware implementation) for computing the absolute value, with N-API support. See https://github.com/stdlib-js/stdlib/tree/62cb2f0ea67ad09deb990d6e86e236785c848856/lib/node_modules/%40stdlib/strided/math/special/abs.
What does this mean? This is part of an effort to bring something comparable to NumPy’s ufuncs ("universal functions"; a.k.a., element-wise functions).
What would this allow? This would allow us to provided strided interfaces to all of our base mathematical functions (all
~160
of them) and any other applicable numerical API (including distributions). And would allow us to be another step closer to something resembling NumPy capabilities.What is needed? Would be great to get initial design feedback from @Planeshifter and @rreusser (and anyone else for that matter) so we can figure out the best approach for ensuring that strided interfaces can be added to
stdlib
with as little work as possible and for ensuring that we can minimize the possibility of future refactoring. Once we begin rolling out strided array interfaces, it may become significantly harder to make changes due to the need to update many packages. So would be good to have confidence that we’ve nailed down the design now, rather having to do a large refactor later.I am particularly interested in feedback on the JavaScript side of things: https://github.com/stdlib-js/stdlib/tree/62cb2f0ea67ad09deb990d6e86e236785c848856/lib/node_modules/%40stdlib/strided/math/special/abs/lib
…and on the C side of things: https://github.com/stdlib-js/stdlib/tree/62cb2f0ea67ad09deb990d6e86e236785c848856/lib/node_modules/%40stdlib/strided/math/special/abs/src/addon.cpp
On the C side of things, one thing I’d like to achieve is that a strided interface need only create the
StridedFunctionObject
. The add-onnamespace
could be hidden away in a macro/interface which an author would include at the very end, similar to how N-API works for performing module initialization.On the JavaScript side of things, I am not sure how we get around needing several files with source documentation. All the files are fairly short. The only "real" code is that which wraps the native add-on interfaces, which is necessary for calling into C when we’ve been provided generic arrays. Source code documentation is useful for JSDoc documentation. Having the interface types in
types.json
is convenient for dynamically generating tests and benchmarks (i.e., having the ability to require thetypes.json
file and then dynamically creating arrays satisfying the type requirements).We may want some way for a user to query what types are supported by a given strided API. What would this look like? In NumPy, you can do
np.sin.types
which returns a list of supported input types.One thing to note is that I’ve not included support for automatic casting and/or detection of overlapping buffers. My sense is that these behaviors, if desired, can be supported by wrapping these "lower-level" interfaces and performing the casting and/or copies there.
Thoughts on all the above?