ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11.16k stars 1.19k forks source link

Can't parse SavedModel to use in TensorflowJs #575

Closed hamletbatista closed 4 years ago

hamletbatista commented 4 years ago

Describe the bug A clear and concise description of what the bug is. I'm trying to export a trained model so I can run inference using TensorflowJs, but the exported .pb doesn't work with the TensorflowJs converter tool. I get this error:

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1781: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. Traceback (most recent call last): File "/usr/local/bin/tensorflowjs_converter", line 8, in sys.exit(pip_main()) File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/converter.py", line 638, in pip_main main([' '.join(sys.argv[1:])]) File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/converter.py", line 642, in main convert(argv[0].split(' ')) File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/converter.py", line 591, in convert strip_debug_ops=args.strip_debug_ops) File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 419, in convert_tf_saved_model model = load(saved_model_dir, saved_model_tags) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load.py", line 519, in load return load_internal(export_dir, tags) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load.py", line 550, in load_internal root = load_v1_in_v2.load(export_dir, tags) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 239, in load return loader.load(tags=tags) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 222, in load signature_functions = self._extract_signatures(wrapped, meta_graph_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load_v1_in_v2.py", line 138, in _extract_signatures signature_fn = wrapped.prune(feeds=feeds, fetches=fetches) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/wrap_function.py", line 320, in prune sources=flat_feeds + self.graph.internal_captures) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/lift_to_graph.py", line 260, in lift_to_graph add_sources=add_sources)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/op_selector.py", line 413, in map_subgraph % (repr(init_tensor), repr(op), _path_from(op, init_tensor, sources))) tensorflow.python.ops.op_selector.UnliftableError: A SavedModel signature needs an input for each placeholder the signature's outputs use. An output for signature 'predict' depends on a placeholder which is not an input (i.e. the placeholder is not fed a value).

Unable to lift tensor <tf.Tensor 'Category0/predictions_Category0/predictions_Category0:0' shape=(?,) dtype=int64> because it depends transitively on placeholder <tf.Operation 'is_training' type=Placeholder> via at least one path, e.g.: Category0/predictions_Category0/predictions_Category0 (ArgMax) <- Category0/predictions_Category0/add (Add) <- Category0/predictions_Category0/MatMul (MatMul) <- concat_combiner/concat_combiner (Identity) <- concat_combiner/concat (Identity) <- Questions/Questions (Identity) <- Questions/dropout/cond/Merge (Merge) <- Questions/dropout/cond/dropout/mul_1 (Mul) <- Questions/dropout/cond/dropout/Cast (Cast) <- Questions/dropout/cond/dropout/GreaterEqual (GreaterEqual) <- Questions/dropout/cond/dropout/rate (Const) <- Questions/dropout/cond/switch_t (Identity) <- Questions/dropout/cond/Switch (Switch) <- is_training (Placeholder)

To Reproduce Steps to reproduce the behavior: You can follow my steps in this colab notebook https://colab.research.google.com/drive/1c1REIK3G5FzwuCxmO8R0xA_0ODDlC57z#scrollTo=vNudSgJAZ7JB

Please provide code, yaml definition file and a sample of data in order to entirely reproduce the issue. Issues that are not reproducible will be ignored.

Everything is in the Colab notebook.

Expected behavior A clear and concise description of what you expected to happen. I am hoping to load the trained model in TensorflowJs

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Add any other context about the problem here. I tried the ideas in this article https://github.com/uber/ludwig/issues/329#issuecomment-548854347

w4nderlust commented 4 years ago

This is not a functionality we currently support, so will mark it as enhancement, but at the same time from the log it seems the problem has to do with the way we save the SavedModel rather than a tfjs specific thing, so we will investigate the issue, and that could potentially solve the problem for tfjs too.

hamletbatista commented 4 years ago

@w4nderlust thanks

ydudin3-zz commented 4 years ago

Hey @hamletbatista, would you mind providing the model_definition.yaml which is referenced in the codelab? Thanks

ydudin3-zz commented 4 years ago

Actually never mind, I see it in the template.

hamletbatista commented 4 years ago

@ydudin3 glad to know. Please let me know if are able to get this to work

ydudin3-zz commented 4 years ago

@hamletbatista it seems from the logs that output tensors get appended to the input_tensors list

For example printing input_tensors yields:

{'Category0': <tf.Tensor 'Category0/Category0_placeholder:0' shape=(?,) dtype=int64>, 'Category2': <tf.Tensor 'Category2/Category2_placeholder:0' shape=(?,) dtype=int64>, 'Questions': <tf.Tensor 'Questions/Questions_placeholder:0' shape=(?, ?) dtype=int32>}

It looks like get_tensors function has these few lines in it: for output_feature in model_definition['output_features']: input_tensors[output_feature['name']] = getattr(model, output_feature['name'])

Is this intentional? I wonder if that's what's causing model load failure.

w4nderlust commented 4 years ago

That is in the cell after "try it again". Not sure why you are running it that way instead of using model.save_savedmodel().

hamletbatista commented 4 years ago

That is in the cell after "try it again". Not sure why you are running it that way instead of using model.save_savedmodel().

I tried that first. See cells above.

image

hamletbatista commented 4 years ago

@hamletbatista it seems from the logs that output tensors get appended to the input_tensors list

For example printing input_tensors yields:

{'Category0': <tf.Tensor 'Category0/Category0_placeholder:0' shape=(?,) dtype=int64>, 'Category2': <tf.Tensor 'Category2/Category2_placeholder:0' shape=(?,) dtype=int64>, 'Questions': <tf.Tensor 'Questions/Questions_placeholder:0' shape=(?, ?) dtype=int32>}

It looks like get_tensors function has these few lines in it: for output_feature in model_definition['output_features']: input_tensors[output_feature['name']] = getattr(model, output_feature['name'])

Is this intentional? I wonder if that's what's causing model load failure.

I haven't touched this code in a while, but I added links in the comments to where I was finding suggestions to fix the issue.

The suggestion came from this comment it appears https://github.com/uber/ludwig/issues/329#issuecomment-508777581

hamletbatista commented 4 years ago

I took a different route to solve this using HuggingFace's library, but yours would make the tutorial much simpler to follow

w4nderlust commented 4 years ago

I tried that first. See cells above.

Yes but it is commented out, and I don't see errors there, what was wrong with it?

I took a different route to solve this using HuggingFace's library, but yours would make the tutorial much simpler to follow

We are adding import of Huggingface's transformer library in the next version of Ludwig. Wonder how are you serving it as even the smaller distilled model is really expensive to use at inference time.

hamletbatista commented 4 years ago

I tried that first. See cells above.

Yes but it is commented out, and I don't see errors there, what was wrong with it?

There was no error or stack trace. The issue was the file generated seemed corrupted or incomplete when I tried to load it.

I will give it another try over the weekend.

I took a different route to solve this using HuggingFace's library, but yours would make the tutorial much simpler to follow

We are adding import of Huggingface's transformer library in the next version of Ludwig. Wonder how are you serving it as even the smaller distilled model is really expensive to use at inference time.

Yes. I have that problem and this is mostly a learning exercise to teach marketers, not for production use. I have in my queue to investigate this research next https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open-source-optimizations-transformer-inference-gpu-cpu/

It seems to solve that issue.

w4nderlust commented 4 years ago

There was no error or stack trace. The issue was the file generated seemed corrupted or incomplete when I tried to load it.

Got it, but you can understand that I need to se the error you were getting about corrupted file, otherwise it's difficult to figure out what the problem is. Ideally you could provide a minimal self contained reproducible zip containing either data and a python script (data can be generated with the data/dataset_sythesizer.py script if you can't share it) or data + yaml file + command to run it.

Yes. I have that problem and this is mostly a learning exercise to teach marketers, not for production use. I have in my queue to investigate this research next https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open-source-optimizations-transformer-inference-gpu-cpu/

It seems to solve that issue.

It was tested on 3 layers bert, the latency is much higher on the full model. Still, it's a step forward ;)

Anyway, I thought you usecase was fast inference at deployment time, but if your goal is just demoing, and you don't care about a super scalable inference pipeline, then you can train a model with Ludwig and then serve it with

ludwig serve --model_path path/to/trained/model

and it will launch a REST API server you can query easily. More info in the User Guide.

ifokeev commented 4 years ago

@w4nderlust just FYI: save_savemodel works incorrectly now. That's because of wrong placeholders' names https://github.com/uber/ludwig/issues/329#issuecomment-548854347

hamletbatista commented 4 years ago

There was no error or stack trace. The issue was the file generated seemed corrupted or incomplete when I tried to load it.

Got it, but you can understand that I need to se the error you were getting about corrupted file, otherwise it's difficult to figure out what the problem is. Ideally you could provide a minimal self contained reproducible zip containing either data and a python script (data can be generated with the data/dataset_sythesizer.py script if you can't share it) or data + yaml file + command to run it.

Yes. I will have time over the weekend :)

Yes. I have that problem and this is mostly a learning exercise to teach marketers, not for production use. I have in my queue to investigate this research next https://cloudblogs.microsoft.com/opensource/2020/01/21/microsoft-onnx-open-source-optimizations-transformer-inference-gpu-cpu/ It seems to solve that issue.

It was tested on 3 layers bert, the latency is much higher on the full model. Still, it's a step forward ;)

Interesting. I will see if I can get decent accuracy. Thanks for the insights.

Anyway, I thought you usecase was fast inference at deployment time, but if your goal is just demoing, and you don't care about a super scalable inference pipeline, then you can train a model with Ludwig and then serve it with

ludwig serve --model_path path/to/trained/model

and it will launch a REST API server you can query easily. More info in the User Guide.

I need to run the model in JS to embed in Google Sheets and Excel. Fetching from a serving URL would be my fall back option.

Thanks

w4nderlust commented 4 years ago

@w4nderlust just FYI: save_savemodel works incorrectly now. That's because of wrong placeholders' names #329 (comment)

Thanks, yes we are working on it. @ydudin3

w4nderlust commented 4 years ago

The merged PR should have solved the issue. There's also an integration test for SavedModel now that shows how to load and save SavedModels and what kind of preprocessing and postprocessing you need to do in order to map data to tensors and prediction tensors to data: https://github.com/uber/ludwig/blob/master/tests/integration_tests/test_savedmodel.py. Let us know if you have further problems.

hamletbatista commented 4 years ago

Thanks. I will check this out. This was sorely needed!