Open viktorlott opened 11 months ago
I'm guessing the answer will be no, given that there are a lot of smart people working on those acceleration problems, so someone would probably have made a note out of it
My question is if this technique could be "faster" and more energy efficient (on certain architectures) than running regular SIMD instructions on the CPU, given that this technique would be SIMD executed on efficient logical FPUs?
The architectures that I'm aware of (eg. Intel, ARM) have fast SIMD instructions for evaluating max(x, y)
on floats, instructions which are much faster than evaluating division. Compare the latency and throughput for _mm256_max_ps
vs _mm256_div_ps
in the Intel Intrinsics Guide for example.
Hey, totally random but kind of interesting. Hopefully not something people already know.
One interesting property of the
IEEE 754
floating-point specification is thatx/0
where x!=0 is equal to infinity, andx/infinity
is equal to zero. So one could define a function likex/(1 + 0^x)
, which basically equalsReLU(x)=max(0, x)
.The interesting part is when you derivate these two. In the
ReLU
corner, we basically end up with piecewise function (majority would know this), where x>0 = 1, x<0 = 0, and x=0 = undefined.Deriving
x/(1 + 0^x)
on the other hand is apparently undefined at first glance, but one could intuit that it should be1/(1 + 0^x)
based on theReLU
derivation. The difference between these two derivatives is that1/(1 + 0^x)
makesx=0
definable (x=0 = 0.5, which is kind of weird.. I wonder how l1-regularization would work in that case.. if I'm thinking correctly). (Just realized thatx/(x + 0^x)
makesx=0 = 1
).My question is if this technique could be "faster" and more energy efficient (on certain architectures) than running regular SIMD instructions on the CPU, given that this technique would be SIMD executed on efficient logical FPUs?