Closed ehsanmok closed 5 years ago
Thanks for your interest in tract !
No big surprise here, tensorflow support is on a "per-application" basis as there is no way tract will support tensorflow entirely.
Adding Relu6 should be trivial. Look at how the macros in core allow to define element-wise ops in https://github.com/snipsco/tract/blob/master/core/src/ops/math/mod.rs You can use these macros to add Relu6 in tensorflow/src/ops/nn/mod.rs for now, and register it as an operator at the top of this same file. Hopefully, that should do the trick.
The second error is different, it actually looks like a bug in the graph parsing, it's a node which is not found, not an operator. I can have a look, unless you feel like debugging.
Thanks for the tips!
For MobilenetV1, I added the Relu6
but another error poped up,
Evaluating #16 \"MobilenetV1/MobilenetV1/Conv2d_1_depthwise/depthwise\"Unimplemented(DepthwiseConv2dNative): unimplemented operation: DepthwiseConv2dNative")
Depthwise conv is the main op in both MobilenetV1/2.
For MobilenetV2, it seems there's no FusedBatchNorm
defined in core/src/ops/nn/
. There're BatchNorm
and FixedBatchNorm
. Basically, it needs to be added separately as conv2 + bn, I think.
Aha :) That one will be a bit more challenging than Relu6... We need a separate implementation for dw conv2d, and I agree it would make sense to have it. I would love to add mobilenet 1 and 2 to the supported networks and get one more opportunity to compare to TFLite and whatever.
I may be able to work on this in a few weeks. I'm a bit deep in rnn right now... But I will help if you feel like giving it a shot in the meantime.
The BatchNorm thing seems easier to deal with (probably a matter of reorganising some operators to the one in core) but I don't think it will bring us much as long as we don't have the DW conv.
Hey @ehsanmok, you may want to give a shot to the mobilenet branch. See #89. I think the network works. The performance is sub-par for now, I need to plug the new depthwise operators on the optimized convolution backends, I will try to do that soon-ish.
(that was for v1, there is still the weird FuseBatchNorm issue with mobilenet v2)
Also fixed the FusedBatchNorm issue. As I was expected, it was not a missing op thing, but MobileNet strangely declares its node in disorder. First network I see like that. So now v1 and v2 are both working correctly, without full optimisation for now.
Hi @kali, great, thanks! tried the patch and it works :)
Knowing it's not optimized though V2 is slightly slower than V1 (which shouldn't be the case AFAIK).
Looking forward to seeing on par performance with TFLite.
@ehsanmok just wanted to let you know that I merged #92. It plugs the dwconv on the regular convolution backend. Performance is better, but still not at the level I'd like: the backend will need a bit of work to handle more efficiently the specific kernel size and channel count induced by depthwise convolutions. Anyway, if you're using it, it may be worth bumping to the top of tree.
don't rush it, there is a bug.
nailed it.
Thanks for letting me know! I tried the optimized one based on the tf example, (+ release mode) was slower than the unoptimized one. I'm afraid it wasn't a complete benchmark on aarch64-linux-gnu
toolchain though. I'm assume based on the signature optimization pre-allocate stuff and maybe more?
You tried the master, right ? not gemm-for-1-m ? that one is not ready for prime time.
You're saying that with tract compiled in release, running the unoptimized network is faster than running the optimized one ?
Yes, I got it from master. I didn't notice any optimization benefit using tfd.into_optimized()
in tf MobilenetV2 example. Is there anything else I should be doing?
Here's the snippet I ran without loading the labels
let mut tfd = ::tract_tensorflow::tensorflow().model_for_path(mobilenet_v2()).unwrap();
tfd.set_input_fact(0, TensorFact::dt_shape(f32::datum_type(), &[1, 224, 224, 3])).unwrap();
let tfd = tfd.into_optimized().unwrap();
let plan = SimplePlan::new(&tfd).unwrap();
let input = load_image(input_image());
let outputs = plan.run(tvec![input]).unwrap();
Btw may I ask how the assemblies in linalg were made and used?
Thanks for the clarification. I'll have a look.
They were made with a lot of love, and they are used, when possible, for direct convolution, or for matrix multiplication. The general idea is that convolution is usually translated to a im2col + matmul (and so will use the smm kernels), but for some valid 1d and 2d case, it is possible to use the direct convolution ("sconv") kernel instead.
On the gemm-1-m branch, I'm focusing on the two main convolutions cases used by mobilenet. There is the depthwise one, and the pointwise one. It is straightforward to translate the pointwise to a simple matmul, the depthwise needs a bit more work...
I observe the same thing on intel on master, the optimized network is 10% slower than the plain one :) Thanks for noticing this :)
All right, I know what happened on master: the "naive" specific implementation I did for the depthwise TF conv is actually relatively good, better than the generic convolution backend. Hopefully, what I'm doing in optim-dw-conv should tip the scale back in the right direction.
Hey, wanted to share some progress on mobilenet optimisation. These benches run on Raspberry pi 3 / raspbian.
Hey, I'm going to close this issues. Some more optimisation may come but they will be part of non mobilenet specific things that i have in mind.
Hi
I wanted to run the pretrained frozen .pb models from mobilenetv1 and mobilenetv2 with
But for MobilenetV1 I get
and for MobilenetV2
Any plan to support
Relu6
orFusedBatchNorm
? Would you be willing to point me where can I add those?