kblomdahl / dream-go

Artificial go player based on reinforcement and supervised learning
Apache License 2.0
47 stars 8 forks source link

Investigate SWISH as activation function in cuDNN #51

Open kblomdahl opened 3 years ago

kblomdahl commented 3 years ago

In cuDNN 8.2 the swish activation function was introduced, this is an activation function that has been very successfully applied in networks such as MobileNetV3 and EfficientNet. It is worth investigating locally as well to see if it yields better performance than rectified linear units.

Performance

When plugging the swish activation function into the cudnn_types.rs benchmark (replacing relu), we some unexpected results that probably indicates that something is broken :)

Further investigations using nvprof indicates that no kernels were launched in the swish examples, so it is not quite a viable candidate yet unless we want to use the backend API.

Relu

test f16_nchw_compute_type_f16 ... bench:      81,782 ns/iter (+/- 17,856)
test f16_nchw_compute_type_f32 ... bench:     153,531 ns/iter (+/- 47,599)
test f16_nhwc_compute_type_f16 ... bench:      97,342 ns/iter (+/- 1,885)
test f16_nhwc_compute_type_f32 ... bench:     120,626 ns/iter (+/- 665)
test i8x32_nhwcvectc           ... bench:      84,497 ns/iter (+/- 1,287)
test i8x32_nhwcvectc_noreorder ... bench:      81,282 ns/iter (+/- 502)

Swish

running 2 tests
test f16_nchw_compute_type_f16 ... bench:      20,640 ns/iter (+/- 711)
test f16_nchw_compute_type_f32 ... bench:      20,627 ns/iter (+/- 181)
test f16_nhwc_compute_type_f16 ... bench:      20,473 ns/iter (+/- 319)
test f16_nhwc_compute_type_f32 ... bench:      20,703 ns/iter (+/- 1,207)
test i8x32_nhwcvectc           ... bench:      20,280 ns/iter (+/- 77)
test i8x32_nhwcvectc_noreorder ... bench:      20,274 ns/iter (+/- 124)