tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.7k stars 1.66k forks source link

Batch accuracy and batch loss are not being plotted in browser or vscode plugin #6785

Open JohnAtl opened 7 months ago

JohnAtl commented 7 months ago

This link in the bug report text did not work for me:

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

/home/user/work/diagnose_tensorboard.py:32: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13 import pipes

Diagnostics

Diagnostics output `````` --- check: autoidentify INFO: diagnose_tensorboard.py version df7af2c6fc0e4c4a5b47aeae078bc7ad95777ffa --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0) INFO: os.name: posix INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='beast', release='6.6.21-1-lts', version='#1 SMP PREEMPT_DYNAMIC Wed, 06 Mar 2024 16:59:55 +0000', machine='x86_64') INFO: sys.getwindowsversion(): N/A --- check: package_management INFO: has conda-meta: False INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==2.16.2 INFO: installed: tensorflow==2.16.1 WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly'] INFO: installed: tensorboard-data-server==0.7.2 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '2.16.2' --- check: tensorflow_python_version 2024-03-14 13:58:36.829200: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-03-14 13:58:36.851097: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-03-14 13:58:37.239354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT INFO: tensorflow.__version__: '2.16.1' INFO: tensorflow.__git_version__: 'v2.16.1-0-g5bc9d26649c' --- check: tensorboard_data_server_version INFO: data server binary: '/home/john/work/Sleep/.venv/lib/python3.12/site-packages/tensorboard_data_server/bin/server' INFO: data server binary version: b'rustboard 0.7.2' --- check: tensorboard_binary_path INFO: which tensorboard: b'/home/john/work/Sleep/.venv/bin/tensorboard\n' --- check: addrinfos socket.has_ipv6 = True socket.AF_UNSPEC = socket.SOCK_STREAM = socket.AI_ADDRCONFIG = socket.AI_PASSIVE = Loopback flags: Loopback infos: [(, , 6, '', ('::1', 0, 0, 0)), (, , 6, '', ('127.0.0.1', 0))] Wildcard flags: Wildcard infos: [(, , 6, '', ('0.0.0.0', 0)), (, , 6, '', ('::', 0, 0, 0))] --- check: readable_fqdn INFO: socket.getfqdn(): 'beast' --- check: stat_tensorboardinfo INFO: directory: /tmp/.tensorboard-info INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=1853, st_dev=44, st_nlink=2, st_uid=1000, st_gid=1000, st_size=40, st_atime=1710438727, st_mtime=1710438986, st_ctime=1710438986) INFO: mode: 0o40777 --- check: source_trees_without_genfiles INFO: tensorboard_roots (1): ['/home/john/work/Sleep/.venv/lib/python3.12/site-packages']; bad_roots (0): [] --- check: full_pip_freeze INFO: pip freeze --all: absl-py==2.1.0 asttokens==2.4.1 astunparse==1.6.3 bidict==0.23.1 biosppy==2.1.2 certifi==2024.2.2 charset-normalizer==3.3.2 colorama==0.4.6 colorlog==6.8.2 comm==0.2.2 contourpy==1.2.0 cycler==0.12.1 debugpy==1.8.1 decorator==5.1.1 dm-tree==0.1.8 easydev==0.13.1 edfio==0.4.0 executing==2.0.1 flatbuffers==24.3.7 fonttools==4.49.0 future==1.0.0 gast==0.5.4 google-pasta==0.2.0 grpcio==1.62.1 h5py==3.10.0 idna==3.6 ipykernel==6.29.3 ipython==8.22.2 jedi==0.19.1 Jinja2==3.1.3 joblib==1.3.2 jupyter_client==8.6.1 jupyter_core==5.7.2 keras==3.0.5 kiwisolver==1.4.5 lazy_loader==0.3 libclang==16.0.6 lightgbm==4.3.0 line-profiler==4.1.2 lxml==5.1.0 Markdown==3.6 markdown-it-py==3.0.0 MarkupSafe==2.1.5 matplotlib==3.8.3 matplotlib-inline==0.1.6 mdurl==0.1.2 ml-dtypes==0.3.2 mne==1.6.1 namex==0.0.7 nest-asyncio==1.6.0 nolds==0.5.2 numpy==1.26.4 nvidia-cublas-cu12==12.3.4.1 nvidia-cuda-cupti-cu12==12.3.101 nvidia-cuda-nvcc-cu12==12.3.107 nvidia-cuda-nvrtc-cu12==12.3.107 nvidia-cuda-runtime-cu12==12.3.101 nvidia-cudnn-cu12==8.9.7.29 nvidia-cufft-cu12==11.0.12.1 nvidia-curand-cu12==10.3.4.107 nvidia-cusolver-cu12==11.5.4.101 nvidia-cusparse-cu12==12.2.0.103 nvidia-nccl-cu12==2.19.3 nvidia-nvjitlink-cu12==12.3.101 opencv-python==4.9.0.80 opt-einsum==3.3.0 packaging==24.0 pandas==2.2.1 parso==0.8.3 pexpect==4.9.0 pillow==10.2.0 pip==24.0 platformdirs==4.2.0 pooch==1.8.1 prompt-toolkit==3.0.43 protobuf==4.25.3 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 Pygments==2.17.2 pyhrv==0.4.1 pyparsing==3.1.2 python-dateutil==2.9.0.post0 pytz==2024.1 PyWavelets==1.5.0 pyzmq==25.1.2 requests==2.31.0 rich==13.7.1 scikit-learn==1.4.1.post1 scipy==1.12.0 seaborn==0.13.2 setuptools==69.2.0 shortuuid==1.0.13 six==1.16.0 spectrum==0.8.1 stack-data==0.6.3 tensorboard==2.16.2 tensorboard-data-server==0.7.2 tensorflow==2.16.1 termcolor==2.4.0 threadpoolctl==3.3.0 tornado==6.4 tqdm==4.66.2 traitlets==5.14.2 typing_extensions==4.10.0 tzdata==2024.1 urllib3==2.2.1 wcwidth==0.2.13 Werkzeug==3.0.1 wheel==0.43.0 wrapt==1.16.0 ``````

In vscode plugin and Firefox, the same issue: image

Issue description

The batch_accuracy and batch_loss are not being plotted. Their is a single dot at the center, but this screenshot was taken after some 3100 batches, so there should have been a line plotted for both.

Callbacks in my model.fit:

        callbacks=[
            tf.keras.callbacks.TensorBoard(log_dir=LOG_PATH, update_freq="batch"),
            chkpt_callback,
        ],
groszewn commented 7 months ago

Would you mind running tensorboard --inspect --logdir <your log directory> and providing the results?

JohnAtl commented 7 months ago

Sure!

inspect output ``` ====================================================================== Processing event files... (this can take a few minutes) ====================================================================== Found event files in: logs/train logs/validation These tags are in logs/train: audio - histograms - images - scalars - tensor batch_accuracy batch_loss epoch_accuracy epoch_learning_rate epoch_loss keras ====================================================================== Event statistics for logs/train: audio - graph first_step 0 last_step 0 max_step 0 min_step 0 num_steps 1 outoforder_steps [] histograms - images - scalars - sessionlog:checkpoint - sessionlog:start - sessionlog:stop - tensor first_step 0 last_step 0 max_step 9 min_step 0 num_steps 10 outoforder_steps [(1, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0)] ====================================================================== These tags are in logs/validation: audio - histograms - images - scalars - tensor epoch_accuracy epoch_loss evaluation_accuracy_vs_iterations evaluation_loss_vs_iterations ====================================================================== Event statistics for logs/validation: audio - graph - histograms - images - scalars - sessionlog:checkpoint - sessionlog:start - sessionlog:stop - tensor first_step 4744 last_step 8 max_step 47480 min_step 0 num_steps 49 outoforder_steps [(4744, 0), (9488, 1), (4691, 0), (9382, 1), (14073, 2), (18764, 3), (23455, 4), (28146, 5), (32837, 6), (37528, 7), (42219, 8), (46910, 9), (4691, 0), (9382, 1), (14073, 2), (18764, 3), (23455, 4), (28146, 5), (32837, 6), (37528, 7), (42219, 8), (46910, 9), (4744, 0), (9488, 1), (14232, 2), (18976, 3), (23720, 4), (28464, 5), (33208, 6), (37952, 7), (42696, 8), (47440, 9), (4748, 0), (9496, 1), (14244, 2), (18992, 3), (23740, 4), (28488, 5), (33236, 6), (37984, 7), (42732, 8), (47480, 9), (4719, 0), (9438, 1), (14157, 2), (18876, 3), (23595, 4), (28314, 5), (33033, 6), (37752, 7), (42471, 8)] ====================================================================== ```

Also, the display in Scalars is the same. And, the single dot is being updated with the latest value when the 30-second update triggers.

image