tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.3k stars 1.91k forks source link

[tfjs-converter] Error reproducing successful EfficientNet conversion after updates #1726

Closed aptlin closed 4 years ago

aptlin commented 5 years ago

Bug Report

TensorFlow.js version

1.1.2 & 1.2.2.1

Browser version

n/a

Describe the problem or feature request

A humongous number of attempts (not all reflected in the commits) to tweak the script and convert the EfficientNet checkpoints exported to tf.SavedModel using tensorflowjs==1.1.2 failed with the following error:

Traceback (most recent call last):
  File "/Users/sdll/.virtualenvs/tfjs-venv/bin/tensorflowjs_converter", line 6, in <module>
    from tensorflowjs.converters.converter import main
  File "/Users/sdll/.virtualenvs/tfjs-venv/lib/python3.7/site-packages/tensorflowjs/__init__.py", line 21, in <module>
    from tensorflowjs import converters
  File "/Users/sdll/.virtualenvs/tfjs-venv/lib/python3.7/site-packages/tensorflowjs/converters/__init__.py", line 24, in <module>
    from tensorflowjs.converters.tf_saved_model_conversion_v2 import convert_tf_saved_model
  File "/Users/sdll/.virtualenvs/tfjs-venv/lib/python3.7/site-packages/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 29, in <module>
    from tensorflow.python.framework import convert_to_constants
ImportError: cannot import name 'convert_to_constants' from 'tensorflow.python.framework' (/Users/sdll/.virtualenvs/tfjs-venv/lib/python3.7/site-packages/tensorflow/python/framework/__init__.py)

UPDATE :: July 8, 11am UTC: Installing tensorflowjs==1.2.1 fails in a clean virtual environment due to ERROR: Could not find a version that satisfies the requirement tf-nightly-2.0-preview>=2.0.0.dev20190502 (from tensorflowjs==1.1.2) (from versions: none). Indeed, installing any version of tf-nightly-2.0-preview fails now. UPDATE :: July 8, 2pm UTC: The cause of the problem above is the removal of the pre-built package of tf-nightly-2.0-preview for OS X. Working in the Debian Docker container does not resolve the issue.

Code to reproduce the bug

See the conversion script for EfficientNet.

Unfortunately, using tensorflow >= 1.14 (and thus tensorflowjs > 1.2.1) does not work, since the converter fails with the following error:

tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'swish_f32' in binary running on sdll.

The full traceback is below:

Traceback (most recent call last):
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/bin/tensorflowjs_converter", line 10, in <module>
    sys.exit(main())
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflowjs/converters/converter.py", line 556, in main
    strip_debug_ops=FLAGS.strip_debug_ops)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 284, in convert_tf_saved_model
    model = load(saved_model_dir, saved_model_tags)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py", line 414, in load
    root = load_v1_in_v2.load(export_dir, tags)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 208, in load
    return loader.load(tags=tags)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 191, in load
    signature_functions = self._extract_signatures(wrapped, meta_graph_def)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 124, in _extract_signatures
    signature_fn = wrapped.prune(feeds=feeds, fetches=fetches)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/eager/wrap_function.py", line 290, in prune
    sources=flat_feeds + internal_captures)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/eager/lift_to_graph.py", line 393, in lift_to_graph
    op=op, graph=graph, op_map=op_map)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/eager/lift_to_graph.py", line 214, in _copy_non_source
    name=op.name)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 464, in create_op
    compute_device=compute_device)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2021, in __init__
    op_def = self._graph._get_op_def(node_def.op)
  File "/Users/sdll/.virtualenvs/tfjs-1.2.2.1/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 4005, in _get_op_def
    c_api.TF_GraphGetOpDef(self._c_graph, compat.as_bytes(type), buf)
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'swish_f32' in binary running on sdll. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

UPDATE :: July 8, 5pm UTC: The converter has little to do with the problem when tf 1.14.0 is used, since I cannot load the converted saved model back the way it is done in the source code: it fails with a similar trace. Loading with tf.saved_model.loader.load works smoothly.

However, I have successfully converted the EfficientNet checkpoints using tensorflow < 1.14, as you can see in the demo:

EfficientNet demo

The problem is that I can not reproduce the conversion now, leaving the PR in a limbo as a result.

Related issues

This one has some hacky solutions, but in the context of tf2.

aptlin commented 5 years ago

@pyu10055, it seems that I am missing something very obvious, since the conversion went smooth the first time I tried it more than two weeks ago (first converting to tf.keras, then to Saved Model, and only then to TF.js JSON) during which tensorflowjs has switched to tensorflow==1.14.0. With tf 1.14, I cannot convert the model neither by this roundabout which I has simplified since then, nor by exporting the checkpoints to Saved Models straight away. I have also tried installing tf-nightly-2.0-preview==2.0.0.dev20190520, tf-nightly-2.0-preview==2.0.0.dev20190601 and tf-nightly-2.0-preview==2.0.0.dev20190623 together with tensorflow==1.13.1 to reproduce the environment then, but to no avail.

Although it has not helped me in the end, there is a snippet of the tensorflowjs_converter SavedModel utility which might make testing easier.

Here is the end of the convoluted but working roundabout to convert EfficientNet checkpoints to Saved Models.

pyu10055 commented 5 years ago

@sdll Looks like the tensorflow function swish_f32 is missing from the saved model. Here is some discussion on this topic in tensorflow repo, https://github.com/tensorflow/tensorflow/issues/29574

Is this a keras model, and you saved with tf.keras? One suggestion from a colleague is to recreate the saved model using TF 1.14, maybe the function is unintentionally removed with other version.

aptlin commented 5 years ago

@pyu10055, the latest version of the script makes no use of tf.keras. Indeed, I have tried exporting the checkpoint to SavedModel using TF 1.14, but had no success, since the attempt to restore the model in python using load (from tensorflow.python.saved_model.load import load) fails with the missing op error. Could you please tell me if I need to load this op somehow during the export to SavedModel?

aptlin commented 5 years ago

@pyu10055, the checkpoints themselves were generated using TF 1.13, it seems. I wonder whether this might cause problems.

Disabling eager execution like the issue suggested did not help.

pyu10055 commented 5 years ago

can this model be loaded with TF 1.13?

aptlin commented 5 years ago

Yes, I used the linked script to load the checkpoints and export a SavedModel which then at one point I converted to TF.js GraphModel without any issues. But then an attempt to use the converter failed due to the import error I mentioned above. I also tried different versions of tensorflowjs, but could not reproduce the successful conversion.

rthadur commented 4 years ago

Closing this due to lack of activity, feel to reopen. Thank you