tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.7k stars 1.66k forks source link

No train metrics show up in scalar plots #2911

Closed zzb3886 closed 4 years ago

zzb3886 commented 4 years ago

Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:

https://stackoverflow.com/questions/tagged/tensorboard

Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:

https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md

To report a problem with TensorBoard itself, please fill out the remainder of this template.

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same environment from which you normally run TensorFlow/TensorBoard, and paste the output here:

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

Diagnostics

Diagnostics output `````` --- check: autoidentify INFO: diagnose_tensorboard.py version 4725c70c7ed724e2d1b9ba5618d7c30b957ee8a4 --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0) INFO: os.name: posix INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='master1', release='4.15.0-66-generic', version='#75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019', machine='x86_64') INFO: sys.getwindowsversion(): N/A --- check: package_management INFO: has conda-meta: False INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==2.0.1 INFO: installed: tensorflow==2.0.0 INFO: installed: tensorflow-estimator==2.0.1 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '2.0.1' --- check: tensorflow_python_version /usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.3) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning) INFO: tensorflow.__version__: '2.0.0' INFO: tensorflow.__git_version__: 'v2.0.0-rc2-26-g64c3d38' --- check: tensorboard_binary_path INFO: which tensorboard: b'/home/bz/.local/bin/tensorboard\n' --- check: readable_fqdn INFO: socket.getfqdn(): 'master1.bz' --- check: stat_tensorboardinfo INFO: directory: /tmp/.tensorboard-info INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=14562209, st_dev=2430, st_nlink=2, st_uid=1000, st_gid=1000, st_size=4096, st_atime=1572563590, st_mtime=1573247067, st_ctime=1573247 067) INFO: mode: 0o40777 --- check: source_trees_without_genfiles INFO: tensorboard_roots (1): ['/home/bz/.local/lib/python3.6/site-packages']; bad_roots (0): [] --- check: full_pip_freeze INFO: pip freeze --all: absl-py==0.8.1 asn1crypto==0.24.0 astor==0.8.0 attrs==17.4.0 Automat==0.6.0 bleach==2.1.2 cachetools==3.1.1 certifi==2018.1.18 chardet==3.0.4 click==6.7 colorama==0.3.7 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 cryptography==2.1.4 decorator==4.1.2 distro-info===0.18ubuntu0.18.04.1 entrypoints==0.2.3.post1 eventkit==0.8.5 gast==0.2.2 google-auth==1.7.0 google-auth-oauthlib==0.4.1 google-pasta==0.1.8 grpcio==1.25.0 h5py==2.10.0 html5lib==0.999999999 httplib2==0.9.2 hyperlink==17.3.1 ib-insync==0.9.53 idna==2.6 incremental==16.10.1 ipykernel==4.8.2 ipython==5.5.0 ipython-genutils==0.2.0 ipywidgets==6.0.0 Jinja2==2.10 joblib==0.14.0 jsonschema==2.6.0 jupyter-client==5.2.2 jupyter-console==5.2.0 jupyter-core==4.4.0 Keras==2.3.1 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 keyring==10.6.0 keyrings.alt==3.0 language-selector==0.1 Markdown==3.1.1 MarkupSafe==1.0 mistune==0.8.3 nbconvert==5.3.1 nbformat==4.4.0 nest-asyncio==1.0.0 netifaces==0.10.4 notebook==5.2.2 numpy==1.17.3 oauthlib==3.1.0 opt-einsum==3.1.0 PAM==0.4.2 pandas==0.24.2 pandocfilters==1.4.2 pexpect==4.2.1 pickleshare==0.7.4 pip==19.3.1 prompt-toolkit==1.0.15 protobuf==3.10.0 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycrypto==2.6.1 Pygments==2.2.0 pygobject==3.26.1 pyOpenSSL==17.5.0 pyserial==3.4 python-apt==1.6.4 python-dateutil==2.8.0 python-debian==0.1.32 pytz==2019.1 pyxdg==0.25 PyYAML==3.12 pyzmq==16.0.2 requests==2.18.4 requests-oauthlib==1.3.0 requests-unixsocket==0.1.5 rsa==4.0 scikit-learn==0.21.3 scipy==1.3.1 SecretStorage==2.3.1 selenium==3.141.0 service-identity==16.0.0 setuptools==41.6.0 simplegeneric==0.8.1 six==1.13.0 sklearn==0.0 ssh-import-id==5.7 systemd-python==234 tdameritrade==0.0.7 tensorboard==2.0.1 tensorflow==2.0.0 tensorflow-estimator==2.0.1 termcolor==1.1.0 terminado==0.7 testpath==0.3.1 tornado==4.5.3 tqdm==4.32.2 traitlets==4.3.2 Twisted==17.9.0 ufw==0.36 unattended-upgrades==0.1 urllib3==1.25.3 vboxapi==1.0 wcwidth==0.1.7 webencodings==0.5 Werkzeug==0.16.0 wheel==0.33.6 wrapt==1.11.2 zope.interface==4.3.2 ``````

For browser-related issues, please additionally specify:

Issue description

I just upgraded tensorflow to 2.0. In training, I noticed tensorboard now has two runs for each experiment, including train and validation. However, only validation has scalar value curves. Train metric plots are always empty.

I can reproduce this issue by using the script in tensorboard get started guide: https://www.tensorflow.org/tensorboard/get_started. The script prints out reasonable train and val metrics as it should, but I'm just not getting the right plots. Screenshot from 2019-11-08 13-21-36

zzb3886 commented 4 years ago

I tried this a few more times. It looks like tensorboard 2.0 has trouble updating the train metrics by itself. If I kill tensorboard and restart it, it will then show both train and validation metrics. If the training is still ongoing, the validation metrics will be updated where as the train metrics are stuck.

rmothukuru commented 4 years ago

@zzb3886, I ran the script provided in the link, https://www.tensorflow.org/tensorboard/get_started and could observe the Graphs for both Training and Validation. Here is the Gist.

Can you please provide more details about your issue.

Regarding If the training is still ongoing, the validation metrics will be updated where as the train metrics are stuck. =>

Tensorflow Graphs get updated from the Event Files stored during Training. So, it is recommended to see and analyze the graphs after the Training is completed, rather than during Training. Please let me know your opinion about the same.

dgrahn commented 4 years ago

I'm encountering the same problem. The train scaler isn't updated until TensorBoard is restarted.

zzb3886 commented 4 years ago

In the script, if the tensorboard is started before training is started, then the problem occurs.

dgrahn commented 4 years ago

If I restart tensorboard during the training, the metrics get updated once, but the problem persists.

On Thu, Nov 14, 2019, 1:04 PM zzb3886 notifications@github.com wrote:

In the script, if the tensorboard is started before training is started, then the problem occurs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorboard/issues/2911?email_source=notifications&email_token=AADALVMT64H2YJEOAA6YNPDQTWHKRA5CNFSM4JK64WF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECXXWI#issuecomment-554007513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADALVJTSQHAKIGEXVC2E7TQTWHKRANCNFSM4JK64WFQ .

wchargin commented 4 years ago

Probably a duplicate of #2084; can you please try the workaround listed in that issue and see if it resolves the problem? https://github.com/tensorflow/tensorboard/issues/2084#issuecomment-483395808

psybuzz commented 4 years ago

Thanks for the report. I can confirm this was working with tf-nightly-2.0-preview==2.0.0.dev20190306 and broken in tf-nightly-2.0-preview==2.0.0.dev20190307. Bisected to https://github.com/tensorflow/tensorflow/commit/c66b603990b9404dc1eb57de9d595aa0ffc8197f

So it seems Keras callbacks have been affected by this bug since March, sadly. I'm going to triage this to someone who knows more context.

Googlers, see cl/237090182

zzb3886 commented 4 years ago

Adding profile_batch=0 to the keras callback resolves it.

psybuzz commented 4 years ago

Duplicate of #2084

AI-P-K commented 4 years ago

I have the following code and I can't manage to get tensorboard to show my anything else but epoch_accuracy and epoch_loss. Can anyone help me? i have followed the steps above and is still not working.

This is the command I run in terminal tensorboard --logdir='logs/'

import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D import pickle import numpy as np from tensorflow.keras.preprocessing.image import ImageDataGenerator import time from datetime import datetime from packaging import version import os

X = pickle.load(open("X.pickle","rb")) y = pickle.load(open("y.pickle","rb"))

X=np.array(X/255.0) y=np.array(y)

dense_layers = [0, 1, 2] layer_sizes = [32, 64, 128] conv_layers = [1, 2, 3]

for dense_layer in dense_layers: for layer_size in layer_sizes: for conv_layer in conv_layers: NAME = "{}-conv-{}-nodes-{}-dense-{}".format(conv_layer, layer_size, dense_layer, int(time.time())) tensorboard = tf.keras.callbacks.TensorBoard(log_dir ='/users/silviumarc/pycharmprojects/classifier/logs/{}'.format(NAME), update_freq='epoch', profile_batch=0, histogram_freq=1) print(NAME)

model = Sequential() model.add(Conv2D(layer_size, (4,4), input_shape = X.shape[1:])) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2,2)))

for l in range(conv_layer-1): model.add(Conv2D(layer_size, (4,4))) model.add(Activation("relu")) model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten()) for l in range(dense_layer): model.add(Dense(layer_size)) model.add(Activation("relu"))

model.add(Dense(64)) model.add(Activation('relu'))

model.add(Dense(1)) model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy", optimizer= 'adam', metrics=['accuracy'])

model.fit(X, y, batch_size=7, epochs=2, validation_split=0.5, callbacks=[tensorboard])