tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.41k stars 1.92k forks source link

TF Select operators in tflite web runtime #5844

Open josephrocca opened 2 years ago

josephrocca commented 2 years ago

System information

Describe the feature and the current behavior/state. While attempting to convert a TF SavedModel to tflite for use on the web, I got this error:

Some ops are not supported by the native TFLite runtime, you can enable TF kernels fallback using TF Select. See instructions: https://www.tensorflow.org/lite/guide/ops_select 
TF Select ops: AddV2, ArgMax, BatchMatMulV2, Cast, GatherV2, MatMul, Mul, Range, RealDiv, Sigmoid, Softmax, StridedSlice, Transpose

And for another model, these were the missing ops:

AddV2, BatchMatMulV2, Cast, ConcatV2, Conv2D, Fill, GatherV2, MatMul, Mul, RealDiv, Sigmoid, Softmax, StridedSlice, Transpose

The linked docs don't show how to run inference with the tflite web runtime (only Android, iOS, C++ and Python). I take this to imply that it's not currently possible to use the extended set of ops with the tflite web runtime.

Will this change the current api? How? Unsure if this will be an API change or a build step in the tflite web runtime.

Who will benefit with this feature? Presumably the "Select TensorFlow operators" feature exists on other platforms due to developer demand. Web developers like myself would get the same benefits as developers on the other platforms.

josephrocca commented 2 years ago

Hey @jinjingforever, sorry for pinging you - just wondering if there's any chance that TF Select ops would be supported in tfjs-tflite in the short to medium term? I'm still blocked here and would like to try helping if possible. No worries if this is low-priority at the moment!

jinjingforever commented 2 years ago

Hi @josephrocca sorry for the delay.. I should've replied earlier:) I am not an expert on model conversion, but those ops seem to be pretty popular and should be supported. In tflite web, we just use the default op resolver here. Weird..

Anyway, is using tflite model a hard requirement? If not, maybe you could try tfjs converter. The converted tfjs models generally have better performance than tflite models.

josephrocca commented 2 years ago

@jinjingforever Thanks for your reply! Both the tflite and tfjs web runtimes had missing operators, and I think I ended up only creating an issue for the tflite ones because I thought it was easier to fix since they would only need to be "enabled" rather than written (but I might have been wrong there).

In general though, I always find myself getting blocked by operator support, and as you can imagine, it makes it hard to build momentum on a project. I end up with several half-finished projects waiting for some ops. If TF Select ops were able to be used on the web, then the web tflite runtime would be a very "strong" fallback in case tfjs has missing ops (which, again, it almost always does for my projects).

I will happily submit bug reports for the missing tfjs ops when I find them (here's a new issue), so that even if I've moved on from the project by the time they're added, others in the future will hopefully have one less road-block. But it would be super, super awesome if there were a strong fallback - even if it runs slower and requires a bigger download (in terms of the model and/or the runtime). That way I can get unblocked quickly and worry about polishing performance stuff later.

I want to stress that being blocked by lack of operator support is easily my biggest problem with trying to get models working with tfjs. I think the "just-in-time" approach to implementing new operators (i.e. wait for people to ask for them, then implement) has caused a bunch of people to become disillusioned with ML-on-the-web because of the operator support roadblocks (people doing "professional" work will probably file a bug, wait for the ops, and push through, but people who are trying out tfjs for the first time in e.g. a weekend project may just get frustrated and give up). I think it's a shame that it's rare enough to be pleasantly surprising to me when a model conversion "just works".

But of course you and your team have limited time to implement operators - you aren't able to implement all the operators up-front ("just-in-case"). So I figure that the tflite runtime might be an excellent solution to this dilemma if TF Select ops can be enabled somehow.

I might be preaching to the choir here (i.e. maybe you have similar thoughts), but I figured that I'd just explain why I think this is important. Sorry for the wall of text! 😅

jinjingforever commented 2 years ago

Thank you so much for your insights on this @josephrocca! I agree with your points, especially about "strong fallback".

I will take a closer look on this issue (focusing on the TF select ops) and see what I can find. (sorry, I am not exact sure the timeline, but this seems to be pretty useful thing to do so I will try to prioritize). Will also discuss this with my team.

Really appreciate it!

jinjingforever commented 2 years ago

BTW @josephrocca do you have a converted tflite model (with TF select) available somewhere I can try? Thanks!

josephrocca commented 2 years ago

@jinjingforever Oh, that's great, thank you for looking into this! Here's an example tflite file that required TF Select operators during conversion from the Saved Model: https://drive.google.com/file/d/1-5kyQlnSYUXB_WXYoJg1mJQSwBblWL10/view?usp=sharing It's quite large (~500mb) because I was just at a "try to get something working" stage, rather than producing a version that's ready for production.

Here's the Colab notebook I used to generate it in case you needed to play around with the conversion process: https://colab.research.google.com/drive/16doh_OIf-UwqhVmSm6RpLxeCtHcYsEsA?usp=sharing

Here's the specific error that I get if I don't include the TF Select operators:

ConverterError: <unknown>:0: error: loc(callsite(callsite(fused["BitwiseAnd:", "BitwiseAnd@__inference_converted_fun_6923"] at fused["PartitionedCall:", "PartitionedCall@__inference_signature_wrapper_6932"]) at fused["PartitionedCall:", "PartitionedCall"])): 'tf.BitwiseAnd' op is neither a custom op nor a flex op
<unknown>:0: note: loc(fused["PartitionedCall:", "PartitionedCall"]): called from
<unknown>:0: note: loc(callsite(callsite(fused["BitwiseAnd:", "BitwiseAnd@__inference_converted_fun_6923"] at fused["PartitionedCall:", "PartitionedCall@__inference_signature_wrapper_6932"]) at fused["PartitionedCall:", "PartitionedCall"])): Error code: ERROR_NEEDS_FLEX_OPS
<unknown>:0: error: failed while converting: 'main': 
Some ops are not supported by the native TFLite runtime, you can enable TF kernels fallback using TF Select. See instructions: https://www.tensorflow.org/lite/guide/ops_select 
TF Select ops: BitwiseAnd
Details:
    tf.BitwiseAnd(tensor<1x1x1x30xi8>, tensor<1x1x30x30xi8>) -> (tensor<1x1x30x30xi8>) : {device = ""}

This bug report was initially based on a different model (from Pytorch, and through a series of converters to tflite). Let me know if you want me to dig that up. Thanks!!

josephrocca commented 2 years ago

Hey @jinjingforever, sorry to pester - just wondering if there has been any movement here, or if there are other things blocking this that I could try to help with?

jinjingforever commented 2 years ago

Hi @josephrocca I am sorry... I haven't had time to look into this and have been busy with other projects.. I probably don't have an ETA on this. Keeping this open for now.

josephrocca commented 2 years ago

No problem, thanks for the update!

gaikwadrahul8 commented 1 year ago

Hi, @josephrocca

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The TFJs team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TFJs version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow.js space.

Thank you for your support and cooperation.

josephrocca commented 1 year ago

@gaikwadrahul8 This is issue is still relevant. TF Lite operators are not yet supported in tfjs-tflite as far as I know. See this comment.

gaikwadrahul8 commented 1 year ago

Hi, @josephrocca

Thank you for the confirmation and I see this comment so I'll keep this issue open and wait for any update on this issue from relevant team. Thank you!