tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

Prediction fails #122

Open ddkang opened 6 years ago

ddkang commented 6 years ago

https://github.com/tensorflow/tpu/tree/master/models/experimental/resnet_bfloat16

The link above says "To run the same code on CPU/GPU, set the flag --use_tpu=False" but after training on the TPU, evaluation and prediction fails with the error

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Conv2D' with these attrs.  Registered devices: [CPU], Registered kernels:
  device='CPU'; T in [DT_DOUBLE]
  device='CPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_HALF]

         [[Node: bfloat16/conv2d/Conv2D = Conv2D[T=DT_BFLOAT16, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true](bfloat16/Pad, Cast)]]
arlied-google commented 6 years ago

This op is using DT_BFLOAT16. DT_BFLOAT16 is supported on TPUs, but is not supported on CPUs or GPUs.

Use the version of non-BFLOAT16 version of ResNet.

ddkang commented 6 years ago

Yes, I understand that DT_BFLOAT16 is not supported on CPUs or GPUs. Are you telling me that once I train a model on the TPU that is can't be used elsewhere?

arlied-google commented 6 years ago

You could do two things.

First, you could modify your model so that it performs its computations in BFLOAT16, but converts BFLOAT16 to FLOAT32 before storing the tensor in the checkpoint.

Second, you could write a custom app that reads the checkpoint, locates the BFLOAT16 tensors, converts them to FLOAT32, and exports a new checkpoint.

ddkang commented 6 years ago

As far as I know this is impossible in TensorFlow. If this is incorrect or outdated, please let me know how.

bignamehyp commented 6 years ago

You can create a variables scope with custom_getter to cast any bp16 variables to fp32 when used.

anandadalton commented 6 years ago

First you need to build your network entirely inside the bfloat16_scope:

https://github.com/tensorflow/tpu/blob/cef293dea8b0b9567cb779dca1556f6974cfa5cd/models/official/resnet/resnet_main.py#L301-L306

Note that this bfloat16_scope is just wrapping the custom_getter logic mentioned above, and you can check this out by visiting the module where it is declared. You also need to ensure that the output tensor from your network is cast back to fp32.

Then you need to have a place in your code that will call export_saved_model on your estimator:

https://github.com/tensorflow/tpu/blob/cef293dea8b0b9567cb779dca1556f6974cfa5cd/models/official/resnet/resnet_main.py#L662-L668

Notice this isn't enough, because there is a requirement to have some sort of serving_input_receiver_fn as an argument to that method.

This is where things get more custom to the model you are running.

You need to supply a function that takes no arguments and when called, yields a ServingInputReceiver. In the ResNet-50 example, this is done here:

https://github.com/tensorflow/tpu/blob/cef293dea8b0b9567cb779dca1556f6974cfa5cd/models/official/resnet/imagenet_input.py#L28-L44

The two arguments of ServingInputReceiver in this example are:

In essence, this ServingInputReceiver is primarily about setting up a subgraph whose purpose is preprocessing: its inputs are features in a format that the server is going to accept from clients, and its outputs are features that the estimator model can use for prediction.