tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.7k stars 1.66k forks source link

hparams table not getting displayed when many hparams are beeing used #2743

Open asorie opened 5 years ago

asorie commented 5 years ago

Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:

https://stackoverflow.com/questions/tagged/tensorboard

Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:

https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md

To report a problem with TensorBoard itself, please fill out the remainder of this template.

Environment information (required)

Diagnostics output `````` --- check: autoidentify INFO: diagnose_tensorboard.py version 4725c70c7ed724e2d1b9ba5618d7c30b957ee8a4 --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0) INFO: os.name: nt INFO: os.uname(): N/A INFO: sys.getwindowsversion(): sys.getwindowsversion(major=10, minor=0, build=14393, platform=2, service_pack='') --- check: package_management INFO: has conda-meta: False INFO: $VIRTUAL_ENV: 'C:\\tensorflow_anduin' --- check: installed_packages INFO: installed: tensorboard==2.0.0 INFO: installed: tensorflow-gpu==2.0.0 INFO: installed: tensorflow-estimator==2.0.0 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '2.0.0' --- check: tensorflow_python_version 2019-10-08 14:40:42.620638: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll INFO: tensorflow.__version__: '2.0.0' INFO: tensorflow.__git_version__: 'v2.0.0-rc2-26-g64c3d382ca' --- check: tensorboard_binary_path INFO: which tensorboard: b'C:\\tensorflow_anduin\\Scripts\\tensorboard.exe\r\n' --- check: readable_fqdn INFO: socket.getfqdn(): '...' --- check: stat_tensorboardinfo INFO: directory: C:\Users\halle\AppData\Local\Temp\.tensorboard-info INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=3096224744103339, st_dev=2217911477, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1570538160, st_mtime=1570538160, st_ctime=1562760637) INFO: mode: 0o40777 --- check: source_trees_without_genfiles INFO: tensorboard_roots (1): ['C:\\tensorflow_anduin\\lib\\site-packages']; bad_roots (0): [] --- check: full_pip_freeze INFO: pip freeze --all: absl-py==0.7.1 adal==1.2.2 asn1crypto==0.24.0 astor==0.8.0 astroid==2.2.5 avro-python3==1.9.1 azure-common==1.1.23 azure-graphrbac==0.53.0 azure-keyvault==1.1.0 azure-mgmt-authorization==0.51.1 azure-mgmt-containerregistry==2.7.0 azure-mgmt-keyvault==1.1.0 azure-mgmt-msi==0.2.0 azure-mgmt-nspkg==3.0.2 azure-mgmt-resource==2.2.0 azure-mgmt-storage==3.1.1 azure-nspkg==3.0.2 azure-storage-blob==1.5.0 azure-storage-common==1.4.2 blinker==1.4 boto3==1.9.238 botocore==1.12.238 cachetools==3.1.1 certifi==2019.9.11 cffi==1.12.3 chardet==3.0.4 Click==7.0 click-completion==0.5.1 clipboard==0.0.4 colorama==0.3.9 cryptography==2.7 cycler==0.10.0 docker==3.7.3 docker-pycreds==0.4.0 docutils==0.15.2 Flask==1.1.1 flatten-json==0.1.7 gast==0.2.2 gitdb2==2.0.6 GitPython==2.1.14 google-api-core==1.14.2 google-auth==1.6.3 google-cloud-core==1.0.3 google-cloud-kms==1.2.1 google-cloud-storage==1.20.0 google-pasta==0.1.7 google-resumable-media==0.4.1 googleapis-common-protos==1.6.0 grpc-google-iam-v1==0.12.3 grpcio==1.22.0 h5py==2.9.0 httplib2==0.14.0 humanize==0.5.1 idna==2.8 imageio==2.5.0 isodate==0.6.0 isort==4.3.21 itsdangerous==1.1.0 Jinja2==2.10.1 jmespath==0.9.4 Keras==2.2.4 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 kiwisolver==1.1.0 lazy-object-proxy==1.4.1 lockfile==0.12.2 Markdown==3.1.1 MarkupSafe==1.1.1 matplotlib==3.1.1 mccabe==0.6.1 missinglink==19.9.26557 missinglink-kernel==19.9.26893 missinglink-sdk==19.9.26893 ml-core==19.9.3999 ml-crypto==0.7.811 ml-legit==19.9.8734 msgpack==0.6.2 msrest==0.6.10 msrestazure==0.6.2 mypy==0.711 mypy-extensions==0.4.1 natsort==6.0.0 netifaces==0.10.9 numpy==1.17.2 oauthlib==3.1.0 opt-einsum==2.3.2 pandas==0.25.1 patsy==0.5.1 pep8==1.7.1 Pillow==6.1.0 pip==19.2.3 ply==3.11 protobuf==3.8.0 psutil==5.6.3 puremagic==1.5 pyasn1==0.4.7 pyasn1-modules==0.2.6 pycparser==2.19 pycryptodome==3.6.6 Pygments==2.4.2 PyJWT==1.7.1 pylint==2.3.1 pyparsing==2.4.0 pyperclip==1.7.0 pypiwin32==223 python-dateutil==2.8.0 pytz==2019.2 pywin32==225 PyYAML==5.1.1 requests==2.22.0 requests-oauthlib==1.2.0 retrying==1.3.3 rope==0.14.0 rsa==4.0 s3transfer==0.2.1 scipy==1.3.0 sentry-sdk==0.11.2 setuptools==41.0.1 shellingham==1.3.1 six==1.12.0 smmap2==2.0.5 sseclient==0.0.24 statsmodels==0.10.1 tensorboard==2.0.0 tensorflow-estimator==2.0.0 tensorflow-gpu==2.0.0 termcolor==1.1.0 terminaltables==3.1.0 tqdm==4.32.2 typed-ast==1.4.0 urllib3==1.24.3 wcwidth==0.1.7 websocket-client==0.56.0 Werkzeug==0.16.0 wheel==0.33.4 wrapt==1.11.2 ``````

Issue description

If I use many hparams (eg. 14) in tensorboard the table doenst display any results but the table head gets displayed correcly. image

But when I delete some of the rows in HPARAMS section the row in the hparams table and the accuracy gets displayed correcly.

HPARAMS = [HP_BATCH_SIZE,
                HP_OPTIMIZER,
                HP_W_PARAM_0,
                HP_W_PARAM_1,
                HP_W_PARAM_2,
                HP_W_PARAM_3,
                HP_CONV1_FILTER,
                HP_CONV1_KERNEL,
                HP_CONV2_FILTER,
                HP_CONV2_KERNEL,
                HP_CONV3_FILTER,
                HP_CONV3_KERNEL,
                HP_Conv_UP_1_UNITS,
                HP_Conv_UP_2_UNITS]

with file_writer.as_default():
        hp.hparams_config(
            hparams=HPARAMS,
            metrics=METRICS,
        )
        hp.hparams(hparams)
gowthamkpr commented 5 years ago

@Asorie Can you please try running this script in your notebook and let me know if you are facing the same issue?

asorie commented 5 years ago

I tried the script , but at section 4. the line %tensorboard --logdir logs/hparam_tuning procudes an error:

ERROR: Failed to launch TensorBoard (exited with 1).
Contents of stderr:
Traceback (most recent call last):
  File "/usr/local/bin/tensorboard", line 10, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/main.py", line 64, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 220, in main
    server = self._make_server()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 299, in _make_server
    self.assets_zip_provider)
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 160, in standard_tensorboard_wsgi
    flags, plugin_loaders, data_provider, assets_zip_provider, multiplexer)
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 228, in TensorBoardWSGIApp
    return TensorBoardWSGI(tbplugins, flags.path_prefix)
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 279, in __init__
    raise ValueError('Duplicate plugins for name %s' % plugin.plugin_name)
ValueError: Duplicate plugins for name projector
gowthamkpr commented 5 years ago

Its because there might be multiple versions of Tensorboard in your system. Please find my github gist here

I am able to see all the hyperparameters on Tensorboard using Tensorflow 2.0. There might be an issue with your tensorboard. Please try to run the same script in your system and see if you can see hparams displayed or no. Thanks!

asorie commented 5 years ago

The script works. I think the problem is, that I tried to add new HP and write the logs to an already used tensorboard.

gowthamkpr commented 5 years ago

Yes. So, I think the problem here is resolved?

asorie commented 5 years ago

Not really. I think tensorboard should look for the HP used and add new to the table if a new HP was found.

asorie commented 5 years ago

If this line: HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd'])) gets changed to: HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd', 'RMSprop'])) and then trained to the same logdir, tensorboard doenst add this new model to the hparams table.

So it isn't possible to dynamically change the possible hparams in the same logdir?

Yannick947 commented 4 years ago

I think I'm facing the same issue. Any updates here?

bersbersbers commented 4 years ago

This issue, in particular, https://github.com/tensorflow/tensorboard/issues/2743#issuecomment-542057891, very much reminds me of #3597. There, the problem is that mixed-type (string + float, meaning some models use a string value, others a numerical value) parameters are all cast to string, but the filter in list_session_groups.py doesn't take that casting into account - it looks for 2.0 and doesn't find "2.0". As a result, only models with string parameter values are found - the other ones just don't show up. I have never used hp.HParam myself, so I cannot say if the two HP_OPTIMIZERs are seen as different types, but it sure feels like a similar issue.

NumberChiffre commented 4 years ago

I'm having the same issue, I am using torch + PPO in rllib and only half of my hyperparams show on tensorboard

vrublack commented 4 years ago

I've had the same issue with TensorboardX, the reason was that the metric name contained a whitespace.

ghost commented 2 years ago

I met the same issue, when the number of hparams is getting large the issue appears.

pinkponk commented 1 year ago

Still an issue for me. Really annoying. Anyone have a solution?

I guess I will try to write Hparams structure to other file and replace that every time I change something. Not sure this works though

arcra commented 12 months ago

The original issue description here suggests the issue appears when "many hparams" are used. Then later it seems to be that users are trying to "add new HP and write the logs to an already used tensorboard".

So I'm not sure I'm understanding what the issue is. Are you logging more hparams data to the same log dir, and you want TB to read it? Does starting tensorboard again like tensorboard --logdir path/to/logs show everything you want to see? Do you have a small example to reproduce the issue?

bersbersbers commented 12 months ago

@arcra it probably covers only one aspect of this issue, but https://github.com/tensorflow/tensorboard/issues/3597#issuecomment-1490793918 has very specific repro steps that I created "only" 7 months ago ("only" compared to the 4 years that this issue has been open).