wikibook / tf2

《시작하세요! 텐서플로 2.0 프로그래밍》 예제 코드
https://wikibook.co.kr/tf2/
48 stars 46 forks source link

Error message at ch7 #7.4 #18

Closed sunkih4 closed 4 years ago

sunkih4 commented 4 years ago

I just try to execute each step of the example in Chapter 7. At the step of #7.4(# 7.4 네트워크 훈련 및 결과 확인 model.fit(X, Y, epochs=100, verbose=0) print(model.predict(X)), when I press Shift+Enter , the following error message appeared. What do I have to do to solve this problem ?


InternalError Traceback (most recent call last)

in 1 # 7.4 네트워크 훈련 및 결과 확인 ----> 2 model.fit(X, Y, epochs=100, verbose=0) 3 print(model.predict(X)) C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) 817 max_queue_size=max_queue_size, 818 workers=workers, --> 819 use_multiprocessing=use_multiprocessing) 820 821 def evaluate(self, C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) 340 mode=ModeKeys.TRAIN, 341 training_context=training_context, --> 342 total_epochs=epochs) 343 cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN) 344 C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs) 126 step=step, mode=mode, size=current_batch_size) as batch_logs: 127 try: --> 128 batch_outs = execution_function(iterator) 129 except (StopIteration, errors.OutOfRangeError): 130 # TODO(kaftan): File bug about tf function and errors.OutOfRangeError? C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py in execution_function(input_fn) 96 # `numpy` translates Tensors to values in Eager mode. 97 return nest.map_structure(_non_none_constant_value, ---> 98 distributed_function(input_fn)) 99 100 return execution_function C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\eager\def_function.py in __call__(self, *args, **kwds) 566 xla_context.Exit() 567 else: --> 568 result = self._call(*args, **kwds) 569 570 if tracing_count == self._get_tracing_count(): C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\eager\def_function.py in _call(self, *args, **kwds) 630 # Lifting succeeded, so variables are initialized and we can run the 631 # stateless function. --> 632 return self._stateless_fn(*args, **kwds) 633 else: 634 canon_args, canon_kwds = \ C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\eager\function.py in __call__(self, *args, **kwargs) 2361 with self._lock: 2362 graph_function, args, kwargs = self._maybe_define_function(args, kwargs) -> 2363 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access 2364 2365 @property C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\eager\function.py in _filtered_call(self, args, kwargs) 1609 if isinstance(t, (ops.Tensor, 1610 resource_variable_ops.BaseResourceVariable))), -> 1611 self.captured_inputs) 1612 1613 def _call_flat(self, args, captured_inputs, cancellation_manager=None): C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager) 1690 # No tape is watching; skip to running the function. 1691 return self._build_call_outputs(self._inference_function.call( -> 1692 ctx, args, cancellation_manager=cancellation_manager)) 1693 forward_backward = self._select_forward_and_backward_functions( 1694 args, C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager) 543 inputs=args, 544 attrs=("executor_type", executor_type, "config_proto", config), --> 545 ctx=ctx) 546 else: 547 outputs = execute.execute_with_cancellation( C:\Anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 65 else: 66 message = e.message ---> 67 six.raise_from(core._status_to_exception(e.code, message), None) 68 except TypeError as e: 69 keras_symbolic_tensors = [ C:\Anaconda3\envs\tf2\lib\site-packages\six.py in raise_from(value, from_value) InternalError: Blas GEMM launch failed : a.shape=(6, 10), b.shape=(10, 10), m=6, n=10, k=10 [[{{node sequential/simple_rnn_1/while/body/_1/MatMul_1}}]] [Op:__inference_distributed_function_1313] Function call stack: distributed_function
greentec commented 4 years ago

Example 7.4 is running fine in colab. Can you tell the version of TensorFlow you are using? You can check with the command below.

import tensorflow as tf
print(tf.__version__)
sunkih4 commented 4 years ago

Thank you for your comment. The version is 2.1.0. (colab version is 2.2.0.) Could it be a problem ? Actually when I tried to install 2.2.0, it failed. So I installed 2.1.0.

greentec commented 4 years ago

In colab, tensorflow 2.1.0 also runs fine. By the way, I found an article on some similar problem situations with the error message you sent. https://www.reddit.com/r/tensorflow/comments/dxnnq2/i_am_getting_an_error_while_running_the_rnn_lstm/

According to this post, it seems that there may be problems when the cuDNN version does not fit. And according to the tensorflow 2.1.0 release documentation, the specification for the recommended version of cuDNN is below.

The tensorflow pip package is built with CUDA 10.1 and cuDNN 7.6.

I recommend checking the CUDA and cuDNN versions to fix the problem.

sunkih4 commented 4 years ago

Thank you very much for your concern. Anyway, the problem was solved. I totally reinstall all the process. (Especially, CUDA 10.2, cudnn 7.6.5 and anaconda 20.07) Now everything goes well. Thank you again.