Closed ZihengZZH closed 5 years ago
It seems that the TF-GPU configuration has some error because I noticed this error information:
Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0
/job:localhost/replica:0/task:0/device:XLA_CPU:0].
Reconfiguring cudnn + tensorflow-gpu env could solve the problem
Description
... I was conducting a simple experiment on Multi30K EN2DE translation (text-only) using
model
transformer andhparams_set
transformer_base. It went well for some time but suddenly, the training stopped after the first training step and cannot proceed.Environment information
For bugs: reproduction and error logs