mrdbourke / tensorflow-deep-learning

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.
https://dbourke.link/ZTMTFcourse
MIT License
5.14k stars 2.53k forks source link

Food Vision not training #432

Closed arghanath007 closed 2 years ago

arghanath007 commented 2 years ago

I was working on the Food Vision project and completed it. It ran completely fine yesterday, when I tried to run it today it is giving me this error. Detected at node 'model/efficientnetv2-b0/stem_conv/Conv2D' defined at (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module> app.launch_new_instance() File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance app.start() File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 612, in start self.io_loop.start() File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start self.asyncio_loop.run_forever() File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever self._run_once() File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once handle._run() File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback ret = callback() File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1233, in inner self.run() File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run yielded = self.gen.send(value) File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 381, in dispatch_queue yield self.process_one() File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 346, in wrapper runner = Runner(result, future, yielded) File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1080, in __init__ self.run() File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run yielded = self.gen.send(value) File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 365, in process_one yield gen.maybe_future(dispatch(*args)) File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper yielded = next(result) File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell yield gen.maybe_future(handler(stream, idents, msg)) File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper yielded = next(result) File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 545, in execute_request user_expressions, allow_stdin, File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper yielded = next(result) File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2855, in run_cell raw_cell, store_history, silent, shell_futures) File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell return runner(coro) File "/usr/local/lib/python3.7/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner coro.send(None) File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async interactivity=interactivity, compiler=compiler, result=result) File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes if (await self.run_code(code, result, async_=asy)): File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-30-4999ceb7efa4>", line 1, in <module> history_101_food_classes_feature_extract = model.fit(train_data, epochs=10,steps_per_epoch=len(train_data), validation_data=test_data, validation_steps=int(0.15 * len(test_data)),callbacks=[tensorboard_callback, checkpoint_callback, lr_scheduler_callback, learning_rate_reduce_callback]) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1409, in fit tmp_logs = self.train_function(iterator) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function return step_function(self, iterator) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step outputs = model.train_step(data) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step y_pred = self(x, training=True) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 490, in __call__ return super().__call__(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 459, in call inputs, training=training, mask=mask) File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 596, in _run_internal_graph outputs = node.layer(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 490, in __call__ return super().__call__(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 459, in call inputs, training=training, mask=mask) File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 596, in _run_internal_graph outputs = node.layer(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__ outputs = call_fn(inputs, *args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/keras/layers/convolutional/base_conv.py", line 250, in call outputs = self.convolution_op(inputs, self.kernel) File "/usr/local/lib/python3.7/dist-packages/keras/layers/convolutional/base_conv.py", line 232, in convolution_op name=self.__class__.__name__) Node: 'model/efficientnetv2-b0/stem_conv/Conv2D' DNN library is not found. [[{{node model/efficientnetv2-b0/stem_conv/Conv2D}}]] [Op:__inference_train_function_18083]

Edit:

This looks like some tensorflow, cuDnn error. Link.

Edit-2:

This solved my issue !apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2 Just run this command on a cell of the notebook and the error is fixed.

mrdbourke commented 2 years ago

Thank you for updating your issue!

Glad to hear you got it fixed.

Seems a few of the new TensorFlow updates are breaking people's models.

I'm guessing these should be worked out in the coming days/weeks.