tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.36k stars 1.92k forks source link

Support for FIFOQueue #1099

Open dsmilkov opened 5 years ago

dsmilkov commented 5 years ago

Currently we fail to convert a TF Model if the model has a FIFOQueue.

We should investigate which models use FIFOQueue and whether queues make sense during inference time, or only for training.

Two paths for graphs with FIFOQueue:

1) Implement a FIFOQueue and make the model stateful so that executing an "enqueue" or "dequeue" node modifies the state of the graph. This solution is technically feasible, but challenging.

2) In the meantime, another option is to treat this as a UX problem and provide information to the user during conversion time. 2A) The simplest solution is to detect that a graph has a "dequeue" op, followed by a synchronous subgraph, and warn the user to consider feeding data right after that op (showing that op's name and shape) when calling model.execute(). 2A) Additionally, would be great to ask the user if they are ok with the converter dropping any nodes before the "dequeue" op, which will also make the graph smaller.

GarimaGoyal66 commented 5 years ago

Hi, I am using a model provided by https://github.com/davidsandberg/facenet/ with the Inception-Resnet-v1 model for facial-recognition. It is using FIFOQueueV2, QueueDequeueUpToV2 Ops which while converting to tensorflow JS/Lite gives Unsupported Ops error. I converted the above Graphdef (.pb) to tensorflowjs model using --skip_op_check=SKIP_OP_CHECK flag. But it failed while loading showing "Tensorflow Op is not supported: FIFOQueueV2". Thus I assume that queues are being used while inference time also.

Also, I'm not be being able to understand the alternative 1 that you provided. Can you explain that in brief?

pyu10055 commented 5 years ago

@dsmilkov I think this falls into a much bigger category of support tf.data ops. All the tf.data ops are stateful ops that cannot be removed from the graph because they have dependencies on a global state. In the case the graph only has FIFOQueueV2, QueueDequeueUpToV2 ops but no QueueEnqueue op. Because the global state can be updated outside of the graph execution and is persisted across sessions.

Here is a summary of an approach for supporting the Queue and other tf.data stream types using a global state:

  1. A global data stream state is generated with the FrozenModel creation.
  2. When creating a tf.data stream (i.e. FIFOQueueV2 op), an resource entry is added to the global state if it has not been created, otherwise return the stream handle as the output. The data stream is identified by the shared name attribute of the op.
  3. User can create the data stream with the same name before the execution of the graph, in order to pre-feed the data.
  4. During the execution time, the other ops will retrieve data from stream in the global state. Some of the op will be async, since it will be stalled until data is available. (For example the QueueDequeueUpToV2 op).
  5. User should/can use the tfjs-data version of those data streams.

Note: This requires the user understand what tf.data ops are in the model, and be able to use the tf.data api. This also add package dependency of tfjs-data to tfjs-converter.

Let me know if these make sense, I can put them into a design doc for further discussion.

dsmilkov commented 5 years ago

I think the fundamental problem is that the data during inference in the browser will come in a different way than during training. For example, during training, it makes sense to feed a dataset/queue of TF.Example records, which also does proto parsing, while during inference in the browser the data might be coming by taking a snapshot of the webcam.

It's too early to try to offer a solution to close that gap. As a first step, the converter should detect whenever the graph has tf.data/queue ops, warn the user, and recommend that the user use the earliest node in the graph after the data feeding when calling model.execute(). This is option 2A in my previous comment.

gaikwadrahul8 commented 11 months ago

Hi, @dsmilkov

Thank you for opening this issue for tracking purposes. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The TFJs team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TFJs version with the latest compatible hardware configuration which could potentially resolve the issue. We can keep the issue open if it is still relevant. Please confirm if we need to keep the issue open.

Thank you for your support and cooperation.