jeffheaton / t81_558_deep_learning

T81-558: Keras - Applications of Deep Neural Networks @Washington University in St. Louis
https://sites.wustl.edu/jeffheaton/t81-558/
Other
5.72k stars 3.03k forks source link

Training stops. Conflict between Python3.7 Pytorch and Tensorbaord #145

Closed carlosnavarro-cn closed 2 years ago

carlosnavarro-cn commented 2 years ago

Hello, I've used this notebook several times before but now the training just stops every time and shows the errors below. (I've made sure that StyleGAN2ADA Pytorch is installed and that Pytorch version is downgraded).

Here's what I get right after I run the cell for training. Thank you:

Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.7/dist-packages/tensorboard/compat/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/stylegan2-ada-pytorch/train.py", line 538, in main() # pylint: disable=no-value-for-parameter File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke return callback(args, *kwargs) File "/usr/local/lib/python3.7/dist-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), args, kwargs) File "/content/stylegan2-ada-pytorch/train.py", line 531, in main subprocess_fn(rank=0, args=args, temp_dir=temp_dir) File "/content/stylegan2-ada-pytorch/train.py", line 383, in subprocess_fn training_loop.training_loop(rank=rank, args) File "/content/stylegan2-ada-pytorch/training/training_loop.py", line 240, in training_loop stats_tfevents = tensorboard.SummaryWriter(run_dir) File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 220, in init self._get_file_writer() File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 251, in _get_file_writer self.flush_secs, self.filename_suffix) File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 61, in init log_dir, max_queue, flush_secs, filename_suffix) File "/usr/local/lib/python3.7/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 72, in init tf.io.gfile.makedirs(logdir) File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 65, in getattr return getattr(load_once(self), attr_name) File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 97, in wrapper cache[arg] = f(arg) File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 50, in load_once module = load_fn() File "/usr/local/lib/python3.7/dist-packages/tensorboard/compat/init.py", line 45, in tf import tensorflow File "/usr/local/lib/python3.7/dist-packages/tensorflow/init.py", line 51, in from ._api.v2 import compat File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/init.py", line 37, in from . import v1 File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/init.py", line 30, in from . import compat File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/compat/init.py", line 37, in from . import v1 File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/compat/v1/init.py", line 47, in from tensorflow._api.v2.compat.v1 import lite File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/init.py", line 9, in from . import experimental File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/experimental/init.py", line 8, in from . import authoring File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/experimental/authoring/init.py", line 8, in from tensorflow.lite.python.authoring.authoring import compatible File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/authoring/authoring.py", line 43, in from tensorflow.lite.python import convert File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/convert.py", line 29, in from tensorflow.lite.python import util File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/util.py", line 51, in from jax import xla_computation as _xla_computation File "/usr/local/lib/python3.7/dist-packages/jax/init.py", line 59, in from .core import eval_context as ensure_compile_time_eval File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 47, in import jax._src.pretty_printer as pp File "/usr/local/lib/python3.7/dist-packages/jax/_src/pretty_printer.py", line 56, in CAN_USE_COLOR = _can_use_color() File "/usr/local/lib/python3.7/dist-packages/jax/_src/pretty_printer.py", line 54, in _can_use_color return sys.stdout.isatty() AttributeError: 'Logger' object has no attribute 'isatty'

jeffheaton commented 2 years ago

I will take a look. Which notebook are you trying to run? I have several GAN related notebooks.

carlosnavarro-cn commented 2 years ago

Thank you! The note book is:

Part 7.2: Train StyleGAN3 with your Own Images

jeffheaton commented 2 years ago

Thanks, might be related to #147, I will be looking at both soon.

jeffheaton commented 2 years ago

Confirmed, I can reproduce this. Looks like a Logger breaking change. Should be easy enough to resolve.

jeffheaton commented 2 years ago

I checked in a fix, I am able to train now.

https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_07_2_train_gan.ipynb

carlosnavarro-cn commented 2 years ago

Thank you so much, Jeff, I've already tried it and it works!