Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow.js installed from (npm or script link):
This is a problem because async models can run at the same time as each other. It's possible that a model might be architected such that it sets the global state, yields to another task, and then reads the global state. This is a race condition because if the other task also sets the global state, the model will read a different global state and produce a bad result.
Describe the expected behavior
TFDF models should not rely on mutable global state.
One solution is to register a custom op for each TFDF model that refers to its own state instead of the global state. This still could have issues if model.executeAsync is called once and then again before the first run is finished, but it looks like we're just storing the TFDF model in this state so we can run it during inference (inference does not actually change the model). In this case, it would be fine.
This approach will let us remove the separate TFDFModel class that wraps the TFJS GraphModel and just sets the global state before calling it because there will no longer be any global state to set (each model will store the state in its custom op).
Another approach is to store the assets and model runner in an constant node or a resource and add it as an input to the SimpleMLLoadModelFromPathWithHandle and SimpleMLInferenceOpWithHandle ops. This is probably better than the first approach I mentioned since it doesn't require registering custom ops.
I'd also like to remove loadTFDFModel and just rely on loadGraphModel if possible, but that's probably out of scope here.
Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/CodePen/any notebook.
Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
Describe the current behavior TFDF sets global state when it executes an async model. This global state is used as an input to the
SimpleMLLoadModelFromPathWithHandle
(which also sets global state) andSimpleMLInferenceOpWithHandle
ops.This is a problem because async models can run at the same time as each other. It's possible that a model might be architected such that it sets the global state, yields to another task, and then reads the global state. This is a race condition because if the other task also sets the global state, the model will read a different global state and produce a bad result.
Describe the expected behavior TFDF models should not rely on mutable global state.
One solution is to register a custom op for each TFDF model that refers to its own state instead of the global state. This still could have issues if
model.executeAsync
is called once and then again before the first run is finished, but it looks like we're just storing the TFDF model in this state so we can run it during inference (inference does not actually change the model). In this case, it would be fine.This approach will let us remove the separate
TFDFModel
class that wraps the TFJSGraphModel
and just sets the global state before calling it because there will no longer be any global state to set (each model will store the state in its custom op).Another approach is to store the assets and model runner in an constant node or a resource and add it as an input to the
SimpleMLLoadModelFromPathWithHandle
andSimpleMLInferenceOpWithHandle
ops. This is probably better than the first approach I mentioned since it doesn't require registering custom ops.I'd also like to remove
loadTFDFModel
and just rely onloadGraphModel
if possible, but that's probably out of scope here.Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/CodePen/any notebook.
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.