Closed amin-nejad closed 3 years ago
Might be related to tensorflow/tensorflow#32017
Thanks @cantwbr , possibly - will keep an eye on it. But this doesn't even get to checkpoint
Thanks @cantwbr , possibly - will keep an eye on it. But this doesn't even get to checkpoint
@amin-nejad: You are right! I think the title of issue tensorflow/tensorflow#32017 is a bit misleading. The execution reported there actually stalls after opening libcublas - just like in the execution you reported.
Description
Decoding hangs on
Successfully opened dynamic library libcudnn.so.7
. Occurs even on a new VM instance (Azure) with all the requirements just installed. UsingCUDA-10.1
and a Tesla K80 GPU. Reducedbatch_size
to just 1 which takes a couple of minutes on CPU, but seems to last indefinitely on GPU (at least an hour and a half, not waited longer).Environment information
For bugs: reproduction and error logs