Open qjia7 opened 3 years ago
@pyu10055 @lina128 @jinjingforever Please help provide some common patterns that you already know. Thanks.
def bottleneck_block(x, expand=64, squeeze=16):
m = Conv2D(expand, (1,1))(x)
m = BatchNormalization()(m)
m = Activation('relu6')(m)
m = DepthwiseConv2D((3,3))(m)
m = BatchNormalization()(m)
m = Activation('relu6')(m)
m = Conv2D(squeeze, (1,1))(m)
m = BatchNormalization()(m)
return Add()([m, x])
Hi, @qjia7
Apologize for the delayed response and we are re-visiting our older feature requests and checking whether those feature requests implemented or not as of now and also refer above comment from @pyu10055 so May I know are you still looking for this feature or are you working on this feature request in TFJs please ?
If someone wants to contribute for this feature then you're always welcome and please feel free to do and please refer these links Ref-1, Ref-2 . Thank you!
Assign this to me. I will look at this issue some time this year.
In TFJS, there are already some fused ops, like fusedConv2d, fusedDepthwiseConv2d, fusedMatMul, which can greatly improve the performance. However, there are many other patterns which are frequently used in many models, but not fused. We'd like use this bug to track all such kind of patterns to see if there are any possibilities to fuse them in TFJS for better performance.
We raise this issue is that TFJS has included webgpu backend. It's more powerful than webgl. Due to tfjs webgpu is based on compute shader rather than fragment shader. It's more flexible to randomly access any position and write to any position. It provides convenience/possibilities to fuse any ops combination. Even a new fused pattern is hard to implement in some backends. For those backends, it's still easy to break down the fused ops into individual ops to execute them.