tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.34k stars 1.92k forks source link

Cannot use relu6 with webgl #2036

Closed kingsharaman closed 4 years ago

kingsharaman commented 4 years ago

1.2.9

Chrome 76.0.3809.132 (Official Build) (64-bit)

When I try to run model.predict() with the converted DeepLab model with MobileNet backend I get the following error message:

Activation relu6 has not been implemented for the WebGL backend.

This should have been fixed in #2016 but I think you should also modify mapActivationToShaderProgram in https://github.com/tensorflow/tfjs/blob/89f8275e6ae6f3d0315225365c86d3c346df0ccd/tfjs-core/src/backends/webgl/backend_webgl.ts

lewis617 commented 4 years ago

请问 relu6 什么时候可以在 webgl 里用

annxingyuan commented 4 years ago

@lewis617 relu6 will be available in the next tf-core release. It was added after 1.2.9.

lewis617 commented 4 years ago

@lewis617 relu6 will be available in the next tf-core release. It was added after 1.2.9.

When will the next version will be released?

rthadur commented 4 years ago

it is available now , please check.

wingman-jr-addon commented 4 years ago

Is it also available in tensorflowjs_converter? I've been trying to check out a keras->tfjs_graph_model conversion to see if I get an inference speed boost but even with an upgrade to 1.2.10 I'm still getting this today: ValueError: Unknown activation function:relu6 (Update: still seeing this with 1.2.10.1 today)

annxingyuan commented 4 years ago

Hey @pyu10055 - is this expected? I would have thought this would be fixed with 1.2.10.

wingman-jr-addon commented 4 years ago

@annxingyuan @pyu10055 Any further thoughts on this? I had tried it out with 1.2.10.1 and it still was giving the same message. I'd try 1.2.11 but I don't see it on pypi yet.

wingman-jr-addon commented 4 years ago

For what it's worth, I did get this "working" by hacking it directly into my venv's keras/losses.py. I know that I had to add this as a custom object to Keras in an older version for general python usage, but without control of the custom object scope, I figured this would be the fastest method to test it (I also had a custom weighted loss function to plumb in while I was it that was not stock MobileNetV2, so it wasn't a big deal).

One thing I did run into too while working on this was errors like the following: File "/home/developer/Desktop/tfjs-convert/venv/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 865, in remove_function TypeError: 'NoneType' object is not callable

After a bit of searching, this seemed like #1582. I noticed that requirements.txt seems to be pulling in 1.14.0 so I tried upgrading tensorflow which bumped to 2.0.0. After re-hacking relu6 (and my custom weighted loss), conversion succeeded. I'll have to do further testing on the output to see if everything ended up all right.

wingman-jr-addon commented 4 years ago

Update: correctness seems fine and speed is almost 2x! Thanks @annxingyuan @pyu10055 for helping look at this and making improvements.

annxingyuan commented 4 years ago

@wingman-jr-addon Are you also using tfjs-core 1.2.10 and tfjs-converter 1.2.10 on the client side? Or tfjs 1.2.10 (which depends on tfjs-core 1.2.10 and tfjs-converter 1.2.10).

wingman-jr-addon commented 4 years ago

JS client libraries: https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.2.10/dist/tf.min.js

Python side:

(venv) developer@developer-VirtualBox:~/Desktop/tfjs-convert$ python --version Python 3.6.8 (venv) developer@developer-VirtualBox:~/Desktop/tfjs-convert$ pip show tensorflow Name: tensorflow Version: 2.0.0 Summary: TensorFlow is an open source machine learning framework for everyone. Home-page: https://www.tensorflow.org/ Author: Google Inc. Author-email: packages@tensorflow.org License: Apache 2.0 Location: /home/developer/Desktop/tfjs-convert/venv/lib/python3.6/site-packages Requires: gast, grpcio, keras-applications, astor, termcolor, tensorflow-estimator, absl-py, keras-preprocessing, protobuf, google-pasta, wheel, wrapt, tensorboard, opt-einsum, numpy, six Required-by: tensorflowjs (venv) developer@developer-VirtualBox:~/Desktop/tfjs-convert$ pip show tensorflowjs Name: tensorflowjs Version: 1.2.10.1 Summary: Python Libraries and Tools for TensorFlow.js Home-page: https://js.tensorflow.org/ Author: Google LLC Author-email: opensource@google.com License: Apache 2.0 Location: /home/developer/Desktop/tfjs-convert/venv/lib/python3.6/site-packages Requires: h5py, tensorflow-hub, gast, tensorflow, PyInquirer, six, numpy Required-by:

Only the local installation of tensorflow was modified in the following way: (venv) developer@developer-VirtualBox:~/Desktop/tfjs-convert$ vi venv/lib/python3.6/site-packages/tensorflow_core/python/keras/activations.py

@keras_export('keras.activations.relu6') def relu6(x): return K.relu(x, max_value=6)

Sorry for the formatting.

annxingyuan commented 4 years ago

@wingman-jr-addon try 1.2.11? we just did another release.

wingman-jr-addon commented 4 years ago

@annxingyuan Well, I've been watching for an update but even though GitHub says it's out there, PyPI and pip say it's still on 1.2.10.1. Perhaps the build server needs a poke?

wingman-jr-addon commented 4 years ago

@annxingyuan @pyu10055 Just checking, should PyPI still say 1.2.10.1 or should it be 1.2.11? I just want to make sure there is not an unexpected problem with build infrastructure; I'm not in a rush for the converter updates.

annxingyuan commented 4 years ago

@wingman-jr-addon I'm a little confused - are you saying you converted a model with the tensorflowjs converter 1.2.10.1, but then are seeing "cannot use relu6 with webgl" in the browser? Which versions of tfjs are you using in the browser?

wingman-jr-addon commented 4 years ago

@annxingyuan The issue is on the Python side.

Originally, I did the following at release 1.2.10.

  1. Boot Ubuntu VM.
  2. Setup local venv.
  3. pip3 install tensorflowjs
  4. tensorflowjs_converter --version returns 1.2.10
  5. Tried to convert a MobilenetV2-based model referencing relu6 using tfjs_graph_model output option . Conversion failed as described above. I stopped.

Then I waited a few days. When I saw 1.2.10.1 was released, I thought that maybe this issue was fixed since it appeared relu6 support was being added across the board. Then I did the following.

  1. pip3 install --upgrade tensorflowjs
  2. tensorflowjs_converter --version returns 1.2.10.1
  3. Tried to convert a MobileNetV2 model again. Conversion failed. Issue did not appear to be fixed. I stopped.

Then I decided to dig into the code and try to figure out why this was not working. After examination of this and trying a thing or two, I realized it would probably be fastest for my to modify my venv's local Keras. So I did the following:

  1. Modify keras/activations.py to add in relu6.
  2. Converted MobileNetV2 model with relu6 successfully with the tfjs_graph_model using the locally modified Keras.
  3. Ran the model in the browser using tfjs 1.2.10. I am not sure if relu6 survives the optimizations done when the output format is tfjs_graph_model, so I do not know if the JS side is successfully using relu6 or doing something else. I saw a large speed boost using tfjs_graph_model due to the optimizations though, which is great!

After this, we corresponded and you indicated that v1.2.11 is released. So then I did the following:

  1. Checked GitHub. I see the tag here.
  2. pip3 install --upgrade tensorflowjs
  3. I expected this to upgrade to 1.2.11. It did not. tensorflowjs_converter --version still returns 1.2.10.1.
  4. PyPI also still shows 1.2.10.1

So, I had two questions.

  1. While the original issue seemed to report that the JS side did not handle relu6 correctly, my original question to @annxingyuan and the response there made me think that it was also expected to be in tensorflowjs_converter. It does not appear to be there. Is that expected?
  2. Is the fact that I see a tag for tfjs-converter at v1.2.11 but PyPI only shows 1.2.10.1 an indication that the package failed to release when the automated build kicked off? Or is this a manual process and the existence of the tag code does not imply a corresponding Python package release such that tensorflowjs_converter should still be on 1.2.10.1 as I am seeing now?

Thank you for your patience if I am misunderstanding the situation or relu6 support.

pyu10055 commented 4 years ago

@wingman-jr-addon Thank you for trying out the different options, The error message is from tf.keras loader, since the relu6 activation function is a mobilenet custom python activation function, it needs to be provided to the loader similar to following:

model = load_model('mobilenet.h5', custom_objects={
                   'relu6': mobilenet.relu6,
                   'DepthwiseConv2D': mobilenet.DepthwiseConv2D})

Our converter did not fix the error from the keras loader, we only added support for the fusing tensorflow relu6 op into conv2d.

Thank you for catching the version discrepancy of the pip package, we actually did not release 1.2.11, since there is no change since 1.2.10.1.

There might be better way to work around your problem:

This only works if you only need to do inference using TFJS.

wingman-jr-addon commented 4 years ago

Thanks @pyu10055 for the explanation! I'm familiar with the "load_model with custom_objects" method from the Python side as I've had to deal with it there. I didn't see a corresponding custom objects parameter etc. for tensorflowjs_converter (nor am I quite sure how one would add that flexibility in the future), so I just tried my hack. I may have to give your method a shot in the future as it sounds cleaner for sure; I haven't played as much with TF's underlying formats yet vs. Keras .h5 format so thanks for the suggestion on a better supported path.