tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.74k stars 1.66k forks source link

Trouble loading tensorboard.plugins.hparams #2488

Closed c2rosa closed 5 years ago

c2rosa commented 5 years ago

Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:

https://stackoverflow.com/questions/tagged/tensorboard

Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:

https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md

To report a problem with TensorBoard itself, please fill out the remainder of this template.

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same environment from which you normally run TensorFlow/TensorBoard, and paste the output here:

Diagnostics

Diagnostics output `````` --- check: autoidentify INFO: diagnose_tensorboard.py source unavailable --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=5, micro=3, releaselevel='final', serial=0) INFO: os.name: posix INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='tensorflow-20190530-101909', release='4.9.0-9-amd64', version='#1 SMP Debian 4.9.168-1+deb9u2 (2019-05-13)', machine='x86_64') INFO: sys.getwindowsversion(): N/A --- check: package_management INFO: has conda-meta: False INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==1.13.1 INFO: installed: tensorflow==1.13.1 INFO: installed: tensorflow-estimator==1.13.0 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '1.13.1' --- check: tensorflow_python_version INFO: tensorflow.__version__: '1.13.1' INFO: tensorflow.__git_version__: "b'v1.13.1-1-gd4fb067'" --- check: tensorboard_binary_path INFO: which tensorboard: b'/usr/local/bin/tensorboard\n' --- check: readable_fqdn INFO: socket.getfqdn(): 'tensorflow-20190530-101909.c.xpo-ltl-advanced-analytics.internal' --- check: stat_tensorboardinfo INFO: directory: /tmp/.tensorboard-info INFO: os.stat(...): os.stat_result(st_mode=16877, st_ino=2104160, st_dev=2049, st_nlink=2, st_uid=1000, st_gid=1001, st_size=4096, st_atime=1562701129, st_mtime=1562701301, st_ctime=1562701301) INFO: mode: 0o40755 --- check: source_trees_without_genfiles INFO: tensorboard_roots (1): ['/usr/local/lib/python3.5/dist-packages']; bad_roots (0): [] --- check: full_pip_freeze INFO: pip freeze --all: absl-py==0.7.1 ansiwrap==0.8.4 arrow==0.13.2 astor==0.8.0 attrs==19.1.0 backcall==0.1.0 bcolz==1.2.1 binaryornot==0.4.4 bleach==3.1.0 boto==2.44.0 bz2file==0.98 cachetools==3.1.0 certifi==2019.3.9 cffi==1.12.3 chardet==3.0.4 Click==7.0 cloud-tpu-profiler==1.13.0 cloudpickle==1.1.1 colorama==0.4.1 configparser==3.7.4 cookiecutter==1.6.0 cryptography==1.7.1 cycler==0.10.0 daal==2019.0 datalab==1.1.4 decorator==4.4.0 defusedxml==0.6.0 dill==0.2.9 distro==1.0.1 docker==4.0.1 dopamine-rl==2.0.3 entrypoints==0.3 enum34==1.1.6 fairing==0.5.3 Flask==1.0.3 future==0.17.1 gast==0.2.2 gcsfs==0.2.2 gevent==1.4.0 gin-config==0.1.4 gitdb2==2.0.5 GitPython==2.1.11 google-api-core==1.11.0 google-api-python-client==1.7.9 google-auth==1.6.3 google-auth-httplib2==0.0.3 google-auth-oauthlib==0.3.0 google-cloud==0.34.0 google-cloud-bigquery==1.12.1 google-cloud-core==1.0.0 google-cloud-dataproc==0.3.1 google-cloud-datastore==1.8.0 google-cloud-language==1.2.0 google-cloud-logging==1.11.0 google-cloud-spanner==1.9.0 google-cloud-storage==1.16.0 google-cloud-translate==1.5.0 google-compute-engine==2.8.16 google-resumable-media==0.3.2 googleapis-common-protos==1.6.0 greenlet==0.4.15 grpc-google-iam-v1==0.11.4 grpcio==1.20.1 gunicorn==19.9.0 gym==0.12.1 h5py==2.9.0 horovod==0.16.2 html5lib==1.0.1 httplib2==0.12.3 icc-rt==2019.0 idna==2.8 imageio==2.5.0 intel-openmp==2019.0 ipykernel==5.1.1 ipython==7.5.0 ipython-genutils==0.2.0 ipython-sql==0.3.9 ipywidgets==7.4.2 itsdangerous==1.1.0 jedi==0.13.3 Jinja2==2.10.1 jinja2-time==0.2.0 joblib==0.13.2 jsonschema==3.0.1 jupyter==1.0.0 jupyter-aihub-deploy-extension==0.1 jupyter-client==5.2.4 jupyter-console==6.0.0 jupyter-contrib-core==0.3.3 jupyter-contrib-nbextensions==0.5.1 jupyter-core==4.4.0 jupyter-highlight-selected-word==0.2.0 jupyter-http-over-ws==0.0.6 jupyter-latex-envs==1.4.6 jupyter-nbextensions-configurator==0.4.1 jupyter-tensorboard==0.1.10 jupyterlab==0.35.6 jupyterlab-git==0.5.0 jupyterlab-server==0.2.0 Keras==2.2.4 Keras-Applications==1.0.7 Keras-Preprocessing==1.0.9 keyring==10.1 keyrings.alt==1.3 kfac==0.1.4 kiwisolver==1.1.0 kubernetes==9.0.0 lxml==4.3.3 Markdown==3.1.1 MarkupSafe==1.1.1 matplotlib==3.0.3 mesh-tensorflow==0.0.5 mistune==0.8.4 mkl==2019.0 mkl-fft==1.0.6 mkl-random==1.0.1.1 mock==3.0.5 mpmath==1.1.0 nbconvert==5.5.0 nbdime==1.0.6 nbformat==4.4.0 networkx==2.3 nltk==3.4.1 notebook==5.7.8 numpy==1.16.3 oauth2client==4.1.3 oauthlib==3.0.1 opencv-python==4.1.0.25 pandas==0.24.2 pandas-profiling==1.4.1 pandocfilters==1.4.2 papermill==1.0.0 parso==0.4.0 pathlib2==2.3.3 pexpect==4.7.0 pickleshare==0.7.5 Pillow==6.0.0 pip==9.0.1 plotly==3.9.0 poyo==0.4.2 prettytable==0.7.2 prometheus-client==0.6.0 promise==2.2.1 prompt-toolkit==2.0.9 protobuf==3.7.1 psutil==5.6.2 ptyprocess==0.6.0 pyarrow==0.13.0 pyasn1==0.4.5 pyasn1-modules==0.2.5 pycparser==2.19 pycrypto==2.6.1 pycurl==7.43.0 pydaal==2019.0.0.20180713 pydot==1.4.1 pyglet==1.3.2 Pygments==2.4.0 pygobject==3.22.0 pyparsing==2.4.0 pypng==0.0.19 pyrsistent==0.15.2 python-apt==1.4.0b3 python-dateutil==2.8.0 pytz==2019.1 PyWavelets==1.0.3 pyxdg==0.25 PyYAML==5.1 pyzmq==18.0.1 qtconsole==4.4.4 requests==2.22.0 requests-oauthlib==1.2.0 retrying==1.3.3 rsa==4.0 scikit-image==0.15.0 scikit-learn==0.21.2 scipy==1.3.0 seaborn==0.9.0 SecretStorage==2.3.1 Send2Trash==1.5.0 setuptools==41.0.1 simplegeneric==0.8.1 six==1.12.0 smmap2==2.0.5 SQLAlchemy==1.3.3 sqlparse==0.3.0 sympy==1.4 tbb==2019.0 tbb4py==2019.0 tenacity==5.0.4 tensor2tensor==1.13.4 tensorboard==1.13.1 tensorflow==1.13.1 tensorflow-datasets==1.0.2 tensorflow-estimator==1.13.0 tensorflow-hub==0.4.0 tensorflow-metadata==0.13.0 tensorflow-probability==0.6.0 tensorflow-serving-api==1.13.0rc1 termcolor==1.1.0 terminado==0.8.2 testpath==0.4.2 textwrap3==0.9.2 tfds-nightly==1.0.1.dev201903050105 tornado==5.1.1 tqdm==4.32.1 traitlets==4.3.2 unattended-upgrades==0.1 uritemplate==3.0.0 urllib3==1.24.2 virtualenv==16.6.0 wcwidth==0.1.7 webencodings==0.5.1 websocket-client==0.56.0 Werkzeug==0.15.4 wheel==0.29.0 whichcraft==0.5.2 widgetsnbextension==3.4.2 wrapt==1.11.1 ``````

Suggestion: Fix permissions on "/tmp/.tensorboard-info"

The ".tensorboard-info" directory was created by an old version of TensorBoard, and its permissions are not set correctly; see issue

2010. Change that directory to be world-accessible (may require

superuser privilege):

chmod 777 /tmp/.tensorboard-info

Next steps

Please try each suggestion enumerated above to determine whether it solves your problem. If none of these suggestions works, please copy ALL of the above output, including the lines containing only backticks, into your GitHub issue or comment. Be sure to redact any sensitive information.

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

For browser-related issues, please additionally specify:

Issue description

I am running: import tensorflow as tf from tensorboard.plugins.hparams import api as hp

within JupyterLab and getting this issue:

ImportError Traceback (most recent call last)

in 1 import tensorflow as tf ----> 2 from tensorboard.plugins.hparams import api as hp ImportError: No module named 'tensorboard.plugins.hparams' Trying to figure out why.
wchargin commented 5 years ago

@c2rosa: Thanks for the report! It looks like you’re running TensorFlow and TensorBoard version 1.13.1, but the hparams functionality was first introduced in version 1.14.0. Please upgrade to that version (it should be a backward-compatible upgrade) to take advantage of the new features:

pip install -U tensorflow==1.14.0
c2rosa commented 5 years ago

OK....now running: !pip install -U tensorflow==1.14.0

Load the TensorBoard notebook extension

%load_ext tensorboard

Clear any logs from previous runs

!rm -rf ./log

import tensorflow as tf from tensorboard.plugins.hparams import api as hp

and get this error after this point:

ImportError Traceback (most recent call last)

in 1 import tensorflow as tf ----> 2 from tensorboard.plugins.hparams import api as hp ImportError: No module named 'tensorboard.plugins.hparams' the error looks similar to what I was seeing last time. I'm running in a Jupyter lab from the GCP AI Platform.
c2rosa commented 5 years ago

Does this run on Colab, but not in JupyterLab? Just curious. I was trying to run in JupyterLab.

wchargin commented 5 years ago

Can you run

from tensorboard import version
print(version.VERSION)

and see what that prints?

I suspect that the old versions of TensorFlow and TensorBoard were already running in the JupyterLab kernel, so installing them and then “importing” again actually used the old cached modules. Restarting the JupyterLab runtime (Kernel menu → Restart Kernel…) should suffice to fix that.

Does this run on Colab, but not in JupyterLab? Just curious.

Importing the hparams module will certainly work on all platforms (it’s just a normal Python module), which is why I suspect that the new version isn’t being picked up properly.

The notebook integrations, %load_ext tensorboard and %tensorboard, need HTTP access to the machine running the Jupyter kernel. That is, we need to be able to open an iframe pointing to the kernel machine. With standard Jupyter, this generally comes “for free”, because you’re already connected to the same machine on a different port. With a hosted JupyterLab service like https://mybinder.org, this is no longer the case. If you run a JupyterLab instance on your own servers/VMs, maybe it becomes easier? I’m not sure.

Colab has a special proxy layer that provides access to the kernel, which we’ve added explicit support for. If JupyterLab has similar functionality, we’d certainly be open to supporting that, too.

c2rosa commented 5 years ago

When I run: from tensorboard import version print(version.VERSION), I get 1.13.1

c2rosa commented 5 years ago

We are using JupyterLab because our company is signing a deal with Google to get BigData, ML analytics, etc..... I hope that using ML/TensorLab with google via JupyterLab gives us at least as much functionality as using ML/TensorLab via the free Colab service Google offers. Please do add in whatever features are available in the paid JupyterLab offering as are available in the public free Colab offering.

c2rosa commented 5 years ago

Even after I restart the Kernel, it still doesn't work. The version is still 1.13.1 Here is the Note book instance page that shows my notebook. I've circled it.
JupyterNoteBook_google

c2rosa commented 5 years ago

If I were to instantiate a new tensorFlow Jupyter notebook, these are my options. What options should I select so that I can run: "!rm -rf ./log

import tensorflow as tf from tensorboard.plugins.hparams import api as hp" and not get the error I was seeing. JupyterNoteBook_google2

c2rosa commented 5 years ago

OK. I created a new Jupyter notebook instance: JupyterNoteBook_google3 It is tensorflow 2.0. It now runs the code....mostly alright. However, when i run the command: "%tensorboard --logdir logs/hparam_tuning"

I don't see the tensorboard come up. I just see this printed to the screen: <IPython.lib.display.IFrame at 0x7f7d0f6d3f28>

What do I need to do to actually have the tensorboard GUI pop up.

thanks!

c2rosa commented 5 years ago

By the way, when I run "from tensorboard import version print(version.VERSION)" from this new Jupyter notebook, I get: 1.14.0a20190603

wchargin commented 5 years ago

Okay, I’ve tried this on a GCP JupyterLab instance rather than a public https://mybinder.org instance. Running pip3 install rather than pip install (and then restarting the kernel) suffices to install the correct version.

The TensorFlow 2.0 notebook runtimes will have a nightly TensorBoard from early June, as you’ve noted, so those will have the hparams APIs available.

It looks like JupyterLab itself has some TensorBoard functionality, where the TensorBoards open in a new tab within the JupyterLab UI. This is using some kind of tunnel (requests to /tensorboard/1/data/… are forwarded to a TensorBoard running… somewhere, but apparently not on the same VM that you have access to from a notebook or terminal). If these TensorBoards suffice for your purposes, you can create them with the “TensorBoard” button on the launcher, or the “Create a new tensorboard” command to specify a custom log directory:

Screenshot of “create a new TensorBoard” under the Commands palette

I’ll reach out to the Cloud AI Platform team on your behalf to see if this tunnel can be extended to enable standard %tensorboard functionality.

(Also: I poked around a bit, and it looks like you’ll need to use pip3 rather than pip for installed packages to be picked up by the runtime, which explains the confusion earlier.)

I don't see the tensorboard come up. I just see this printed to the screen: <IPython.lib.display.IFrame at 0x7f7d0f6d3f28>

This is certainly a bug in something (it works in Colab, IPython, and Jupyter, but not JupyterLab), but resolving it won’t actually be useful until the proxy/tunnel issue is resolved, so I’ll defer investigating this. There is a quick fix, if it comes to that.

wchargin commented 5 years ago

I filed a bug for you internally. (Googlers, see http://b/138959137.)

c2rosa commented 5 years ago

As per your suggestion above, I modified my first line from:

!pip install -q tf-nightly-2.0-preview
# Load the TensorBoard notebook extension
%load_ext tensorboard

to

!pip3 install -q tf-nightly-2.0-preview
# Load the TensorBoard notebook extension
%load_ext tensorboard

Then ran everything as it was before (after restarting the Kernel).

The following steps all run fine.

Model code from tutorial notebook ``` # Clear any logs from previous runs !rm -rf ./logs/ import tensorflow as tf from tensorboard.plugins.hparams import api as hp fashion_mnist = tf.keras.datasets.fashion_mnist (x_train, y_train),(x_test, y_test) = fashion_mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32])) HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2)) HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd'])) METRIC_ACCURACY = 'accuracy' with tf.summary.create_file_writer('logs/hparam_tuning').as_default(): hp.hparams_config( hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER], metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')], ) def train_test_model(hparams): model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu), tf.keras.layers.Dropout(hparams[HP_DROPOUT]), tf.keras.layers.Dense(10, activation=tf.nn.softmax), ]) model.compile( optimizer=hparams[HP_OPTIMIZER], loss='sparse_categorical_crossentropy', metrics=['accuracy'], ) model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes _, accuracy = model.evaluate(x_test, y_test) return accuracy def run(run_dir, hparams): with tf.summary.create_file_writer(run_dir).as_default(): hp.hparams(hparams) # record the values used in this trial accuracy = train_test_model(hparams) tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1) session_num = 0 for num_units in HP_NUM_UNITS.domain.values: for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value): for optimizer in HP_OPTIMIZER.domain.values: hparams = { HP_NUM_UNITS: num_units, HP_DROPOUT: dropout_rate, HP_OPTIMIZER: optimizer, } run_name = "run-%d" % session_num print('--- Starting trial: %s' % run_name) print({h.name: hparams[h] for h in hparams}) run('logs/hparam_tuning/' + run_name, hparams) session_num += 1 ```

However, when I try and execute:

%tensorboard --logdir logs/hparam_tuning

I get the following message:

Launching TensorBoard...
and then line after line of exceptions and tracebacks......

W0806 12:54:22.998102 139666203186944 manager.py:321] invalid info file: '/tmp/.tensorboard-info/pid-23058.info'
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorboard/manager.py", line 316, in get_all
    info = _info_from_string(contents)
  File "/usr/local/lib/python3.5/dist-packages/tensorboard/manager.py", line 155, in _info_from_string
    raise ValueError("incompatible version: %r" % (json_value,))
ValueError: incompatible version: {'cache_key': 'eyJhcmd1bWVudHMiOlsiLS1sb2dkaXIiLCJsb2dzL2hwYXJhbV90dW5pbmciXSwiY29uZmlndXJlX2t3YXJncyI6e30sIndvcmtpbmdfZGlyZWN0b3J5IjoiL2hvbWUvanVweXRlci90ZW5zb3JGbG93X2NociJ9', 'port': 6006, 'version': '1.15.0a20190806', 'db': '', 'pid': 23058, 'path_prefix': '', 'logdir': 'logs/hparam_tuning', 'start_time': 1565096062}
W0806 12:54:23.505495 139666203186944 manager.py:321] invalid info file: '/tmp/.tensorboard-info/pid-23058.info'
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorboard/manager.py", line 316, in get_all
    info = _info_from_string(contents)
  File "/usr/local/lib/python3.5/dist-packages/tensorboard/manager.py", line 155, in _info_from_string
    raise ValueError("incompatible version: %r" % (json_value,))
ValueError: incompatible version: {'cache_key': 'eyJhcmd1bWVudHMiOlsiLS1sb2dkaXIiLCJsb2dzL2hwYXJhbV90dW5pbmciXSwiY29uZmlndXJlX2t3YXJncyI6e30sIndvcmtpbmdfZGlyZWN0b3J5IjoiL2hvbWUvanVweXRlci90ZW5zb3JGbG93X2NociJ9', 'port': 6006, 'version': '1.15.0a20190806', 'db': '', 'pid': 23058, 'path_prefix': '', 'logdir': 'logs/hparam_tuning', 'start_time': 1565096062}
W0806 12:54:24.008633 139666203186944 manager.py:321] invalid info file: '/tmp/.tensorboard-info/pid-23058.info'
Traceback (most recent call last):
.
.
.
.
.

etc

Finally, the message at the top of this dump (which was "Launching TensorBoard...") is rewritten as:

ERROR: Timed out waiting for TensorBoard to start. It may still be running as pid 23058.

In the mean time, I will try and mess around with Jupyter Labs Tensor board functionality as you point out above.

Thanks!

c2rosa commented 5 years ago

As per your suggestion about creating a separate TensorBoard, I did try this, and was able to bring up a Tensor Board in a separate tab. I just don't understand how to link this with Python code that actually is training/testing a TensorFlow Neural net.

Thanks for any assistance with this.

image