tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.72k stars 1.66k forks source link

6+ Hours Server Startup Time #6077

Open LucaBonfiglioli opened 1 year ago

LucaBonfiglioli commented 1 year ago

Hello, I am currently experiencing a very annoying issue with tensorboard. Basically, in order to open any events file/folder (usually done with tensorboard --logdir XXX, I have to wait an insanely long time (a folder with 12 tensorbard files takes more than six hours at 100% CPU load in order to open). My machine is pretty beefy, so that's not the issue here. There is definetely something weird going on either with my logging files or my tensorboard installation.

First of all I uninstalled/reinstalled tensorboard on a completely new environment, by running pip install -U tensorboard. I tried many different TB versions and the issue is still there. I am currently using version 2.11.

Secondly I tried to replicate the issue on a much smaller file with just some random scalars and some images. The file weighs only some MB, definetely something that should be handled very smoothly. Loading times for this file is approximately 90s (still way too long). It also appears to scale non-linearly with the events file size, meaning that twice the file size results in more than twice the loading time.

I then wrote a simple script that loads an events file from python and measured the loading time:

from tensorboard.backend.event_processing.event_accumulator import EventAccumulator

f = "/a/very/personal/path/to/a/tensorboard/logging/folder"
event_acc = EventAccumulator(f)
event_acc.Reload()

And then I went deeper by profiling the program execution with cProfile. Apparently, 97% of the time is wasted in pywrap_tensorboard.py:crc_update, invoked 78610 times with an average 1.1ms per-call time.

I attach the cProfile outputs, so you can take a look. The file haz ZIP extension but it is not really a zip file, this is just to make it pass github bs file checks. You can open it with any python profiler viewers like snakeviz.

pip install snakeviz
snakeviz PATH/TO/FILE

god_why.zip

ericdnielsen commented 1 year ago

Please fill out the details requested in our template issue:

The output of diagnose_tensorboard.py will be particularly helpful.

Some things you might be able to try depending on the OS you're running on. A) If you're not on Windows or a M1 Mac, try to use Rustboard (see instructions at https://github.com/tensorflow/tensorboard/blob/5214b0822af46de61091eca608c59ab2fd0fbdc2/tensorboard/data/server/DEVELOPMENT.md) this can be significantly faster. B) It looks like you're using the pure python read-path without TensorFlow installed (which has faster replacements). If you're able to install Tensorflow and import through that dep you could get faster loads. C) If neither of those work, you can file a feature request for a faster CRC implementation; though either options A or B should work.


Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:

https://stackoverflow.com/questions/tagged/tensorboard

Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:

https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md

To report a problem with TensorBoard itself, please fill out the remainder of this template.

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same environment from which you normally run TensorFlow/TensorBoard, and paste the output here:

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

For browser-related issues, please additionally specify:

Issue description

Please describe the bug as clearly as possible. How can we reproduce the problem without additional resources (including external data files and proprietary Python modules)?

LucaBonfiglioli commented 1 year ago

Hello, I just ran the diagnose_tensorboard.py script, here is the output:

--- check: autoidentify
INFO: diagnose_tensorboard.py version 516a2f9433ba4f9c3a4fdb0f89735870eda054a1

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='luca-eyecan', release='5.15.0-53-generic', version='#59~20.04.1-Ubuntu SMP Thu Oct 20 15:10:22 UTC 2022', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: '/home/luca/venvs/tb'

--- check: installed_packages
INFO: installed: tensorboard==2.11.0
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly']
INFO: installed: tensorboard-data-server==0.6.1

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.11.0'

--- check: tensorflow_python_version
Traceback (most recent call last):
  File "diagnose_tensorboard.py", line 528, in main
    suggestions.extend(check())
  File "diagnose_tensorboard.py", line 81, in wrapper
    result = fn()
  File "diagnose_tensorboard.py", line 284, in tensorflow_python_version
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

--- check: tensorboard_data_server_version
INFO: data server binary: '/home/luca/venvs/tb/lib/python3.8/site-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.1'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/home/luca/venvs/tb/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'luca-eyecan.ai'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: .tensorboard-info directory does not exist

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/home/luca/venvs/tb/lib/python3.8/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==1.3.0
cachetools==5.2.0
certifi==2022.9.24
charset-normalizer==2.1.1
google-auth==2.14.1
google-auth-oauthlib==0.4.6
grpcio==1.50.0
idna==3.4
importlib-metadata==5.1.0
Markdown==3.4.1
MarkupSafe==2.1.1
numpy==1.23.5
oauthlib==3.2.2
pip==22.3.1
pkg_resources==0.0.0
protobuf==3.20.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
setuptools==65.6.3
six==1.16.0
tensorboard==2.11.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
urllib3==1.26.13
Werkzeug==2.2.2
wheel==0.38.4
zipp==3.11.0

Also, I just tried the same tensorboard logging files on another device (which also has no tensorflow installation) and the loading times were roughly 1s.

LucaBonfiglioli commented 1 year ago

Tried with a tensorflow installation, waiting time has dropped considerably, but still it takes 40 secs compared to ~1s on another PC with the same exact specs and environment as mine.