BERT preprocess layer runs on CPU

ydennisy commented 3 years ago

Hi,

When running a BERT model as part of a larger model the preprocess layer seems to run on CPU and then the main BERT component on GPU - this involves lots of ops moving data back and forth and is prohibitively slow.

Is there a way to force all the ops to run on GPU?

Thanks!

arnoegw commented 3 years ago

While I can't comment on the details of performance in your case, here are some general observations:

Preprocessing deals a lot in string processing and hash table lookups, for which CPUs are much better suited than GPUs. (In fact, most of the respective ops have not even been implemented for GPUs.) For the fast linear algebra inside the actual BERT encoder, it's the opposite. Therefore, it makes sense for TensorFlow to split the work between the two.
It's natural for training data to flow from the CPU to the GPU, because disk or network I/O goes through the CPU. The typical concern is: how much slower does each training step get due to reading training data on top of the unavoidable work for gradient-based training?
If preprocessing adds a significant overhead there (in relation to the model size at hand), the trick is to take preprocessing out of the loop, quite literally. In TensorFlow, you can do this by executing the preprocessor step asynchronously (and hence on separate CPU cores) in a tf.data.Dataset.map() call before feeding the dataset into the training loop. Or you can simply have two programs: one that does preprocessing and writes the results back to disk, and a second that does the training from already-preprocessed data.

For a complete example of the Dataset.map() approach, see https://www.tensorflow.org/tutorials/text/solve_glue_tasks_using_bert_on_tpu and ignore the TPU specifics if you are targeting GPUs.

All that said, is there a concrete, actionable defect with the TensorFlow Hub models, library or documentation that needs tracking here? If yes, please provide a compete reproduction and an explanation of what goes wrong. If not, let's close this issue. We recommend StackOverflow (specifically, tag tensorflow-hub) for general how-to and support questions.

ydennisy commented 3 years ago

@arnoegw thanks for that - makes perfect sense.

I will close!

tensorflow / hub

BERT preprocess layer runs on CPU #730