tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.68k stars 1.65k forks source link

Tensorflow not fetching new data #3154

Open forReason opened 4 years ago

forReason commented 4 years ago

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same environment from which you normally run TensorFlow/TensorBoard, and paste the output here:

Diagnostics

Diagnostics output `````` --- check: autoidentify INFO: diagnose_tensorboard.py version d515ab103e2b1cfcea2b096187741a0eeb8822ef --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0) INFO: os.name: nt INFO: os.uname(): N/A INFO: sys.getwindowsversion(): sys.getwindowsversion(major=10, minor=0, build=14393, platform=2, service_pack='') --- check: package_management INFO: has conda-meta: False INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==2.1.0 INFO: installed: tensorflow==2.1.0 INFO: installed: tensorflow-estimator==2.1.0 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '2.1.0' --- check: tensorflow_python_version 2020-01-21 18:44:11.139289: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-01-21 18:44:11.143821: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. INFO: tensorflow.__version__: '2.1.0' INFO: tensorflow.__git_version__: 'v2.1.0-rc2-17-ge5bf8de410' --- check: tensorboard_binary_path INFO: which tensorboard: b'C:\\Program Files\\Python36\\Scripts\\tensorboard.exe\r\n' --- check: addrinfos socket.has_ipv6 = True socket.AF_UNSPEC = socket.SOCK_STREAM = socket.AI_ADDRCONFIG = socket.AI_PASSIVE = Loopback flags: Loopback infos: [(, , 0, '', ('::1', 0, 0, 0)), (, , 0, '', ('127.0.0.1', 0))] Wildcard flags: Wildcard infos: [(, , 0, '', ('::', 0, 0, 0)), (, , 0, '', ('0.0.0.0', 0))] --- check: readable_fqdn INFO: socket.getfqdn(): 'DRG-APP01.home' --- check: stat_tensorboardinfo INFO: directory: C:\Users\drgadmin\AppData\Local\Temp\1\.tensorboard-info INFO: .tensorboard-info directory does not exist --- check: source_trees_without_genfiles INFO: tensorboard_roots (1): ['C:\\Program Files\\Python36\\lib\\site-packages']; bad_roots (0): [] --- check: full_pip_freeze INFO: pip freeze --all: absl-py==0.9.0 astor==0.8.1 cachetools==4.0.0 certifi==2019.11.28 chardet==3.0.4 gast==0.2.2 google-auth==1.10.0 google-auth-oauthlib==0.4.1 google-pasta==0.1.8 grpcio==1.26.0 h5py==2.10.0 idna==2.8 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 Markdown==3.1.1 numpy==1.18.1 oauthlib==3.1.0 opt-einsum==3.1.0 pip==19.3.1 protobuf==3.11.2 pyasn1==0.4.8 pyasn1-modules==0.2.8 requests==2.22.0 requests-oauthlib==1.3.0 rsa==4.0 scipy==1.4.1 setuptools==45.0.0 six==1.13.0 tensorboard==2.1.0 tensorflow==2.1.0 tensorflow-estimator==2.1.0 termcolor==1.1.0 urllib3==1.25.7 Werkzeug==0.16.0 wheel==0.33.6 wrapt==1.11.2 ``````

Next steps

No action items identified. Please copy ALL of the above output, including the lines containing only backticks, into your GitHub issue or comment. Be sure to redact any sensitive information.

For browser-related issues, please additionally specify:

Issue description

Expected behavior: Tensorflow updates ever x seconds, the graph changes, new Models are pulled into the list.

Actual Behavior: When refreshing, new models are pulled correctly into tensorboard, as shown in the screenshot, they load the graphs up to {refreshtime} but never update them. When refreshing the Page, graphs do not get updated. Stopping the tensorboard server and startingit again will pull all prices but they will not update afterwards again.

Issue Code:

            modelname = f"{layer}-layer_{layerdensity}-nodes_lRelu-adam_{learningrate}-lr_{records_per_epoch}-epochsize_{appendix}_bn"
            model = keras.Sequential()
            model.add(Dense(layerdensity, activation=tf.nn.leaky_relu, input_dim=inputnodes))
            model.add(BatchNormalization())
            for i in range(layer-1):
                model.add(Dense(layerdensity, activation=tf.nn.leaky_relu))
                model.add(BatchNormalization())
            model.add(Dense(2, name = "Output"))
            # Compile
            optimizer = tf.keras.optimizers.Adam(lr=learningrate)
            model.compile(
                optimizer=optimizer,
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])
            model.summary()
            tensorboard = TensorBoard(log_dir="Z:\\Projects\\Notebooks\\BitSurfer\\log\\" + modelname,
                histogram_freq = 100, write_graph = False)
            #cp_callback = tf.keras.callbacks.ModelCheckpoint("\\\\drg-fs01\\BigData\\Projects\\Notebooks\\PokerBot\\checkpoints\\" + modelname, verbose=0)
            ################################
            # train model                  #
            ################################
            model.fit(trainSet, 
                epochs = epochs, 
                steps_per_epoch = trainSteps, 
                shuffle = True, 
                validation_data = testSet, 
                validation_steps = testSteps, 
                validation_freq = int(epochs/maxTestEpochs),
                verbose = verbose, 
                callbacks = [tensorboard])#,cp_callback])
            model.save(savePath+'saved_models/' + modelname + '.h5')
            print(f"saving model: {modelname}")
forReason commented 4 years ago

As a note: My environment consists of 3 Machines:

Both the local machine and Server are connected to the nas. The issue might be related to the nas beeing a remote location (in the same home network)?

I will try running tensorboard locally and see if that makes any difference.

rithiksachdev commented 4 years ago

Hi @forReason. Is your issue resolved? Did locally running made any difference?

forReason commented 4 years ago

Nope, I gave up on it for now. Right now I am waiting till training is finished or until I want to refresh and then kill tensorboard on my server and restart the process. Im not sure though, if there was a new release that fixes it.

drewm1980 commented 4 years ago

I am experiencing this too, though in a very different environment. I also have multiple machines dumping files onto a NAS. I am using Ubuntu 18.04, linux 5.3.0 on the machine running tensorboard. The log files are getting generated by pytorch. I'm using FS_CACHE on the machine running tensorboard. I need to restart tensorboard every time I want to see updates.

Tensorboard is version 1.14.0. Tensorflow isn't installed in my python environment, and tensorboard warns me that it is running without tensorflow. Python is 3.7.6 pytorch is 1.4.0 on the machine running tensorboard, 1.3.1 on two others that are also writing to logs.

Here is my diagnostics output:

Diagnostics

Diagnostics output `````` --- check: autoidentify INFO: diagnose_tensorboard.py version 9994af9ec23c0b0824f9c39ec4d6c53290a226d4 --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0) INFO: os.name: posix INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='rv-awagner', release='5.3.0-51-generic', version='#44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020', machine='x86_64') INFO: sys.getwindowsversion(): N/A --- check: package_management INFO: has conda-meta: True INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==1.14.0 WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview'] INFO: installed: tensorflow-estimator==1.13.0 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '1.14.0' --- check: tensorflow_python_version Traceback (most recent call last): File "./diagnose_tensorboard.py", line 471, in main suggestions.extend(check()) File "./diagnose_tensorboard.py", line 79, in wrapper result = fn() File "./diagnose_tensorboard.py", line 254, in tensorflow_python_version import tensorflow as tf ModuleNotFoundError: No module named 'tensorflow' --- check: tensorboard_binary_path INFO: which tensorboard: b'/home/awagner/miniconda3/bin/tensorboard\n' --- check: addrinfos socket.has_ipv6 = True socket.AF_UNSPEC = socket.SOCK_STREAM = socket.AI_ADDRCONFIG = socket.AI_PASSIVE = Loopback flags: Loopback infos: [(, , 6, '', ('::1', 0, 0, 0)), (, , 6, '', ('127.0.0.1', 0))] Wildcard flags: Wildcard infos: [(, , 6, '', ('0.0.0.0', 0)), (, , 6, '', ('::', 0, 0, 0))] --- check: readable_fqdn INFO: socket.getfqdn(): 'rv-awagner' --- check: stat_tensorboardinfo INFO: directory: /tmp/.tensorboard-info INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=27279729, st_dev=66306, st_nlink=2, st_uid=1001, st_gid=1001, st_size=4096, st_atime=1588749425, st_mtime=1588766667, st_ctime=1588766667) INFO: mode: 0o40777 --- check: source_trees_without_genfiles INFO: tensorboard_roots (1): ['/home/awagner/miniconda3/lib/python3.7/site-packages']; bad_roots (0): [] --- check: full_pip_freeze INFO: pip freeze --all: absl-py==0.5.0 asn1crypto==1.3.0 astor==0.7.1 backcall==0.1.0 brotlipy==0.7.0 certifi==2020.4.5.1 cffi==1.14.0 chardet==3.0.4 cloudpickle==1.2.2 colorama==0.4.1 conda==4.8.3 conda-package-handling==1.6.0 cryptography==2.9.2 cryptography-vectors==2.9.2 cycler==0.10.0 cytoolz==0.10.1 dask==2.9.2 decorator==4.3.0 defusedxml==0.6.0 e2cnn==0.1 freetype-py==2.1.0.post1 gast==0.2.0 grpcio==1.23.0 h5py==2.8.0 idna==2.9 imageio==2.8.0 ipdb==0.11 ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1588362967322/work ipython-genutils==0.2.0 jedi==0.13.3 Keras-Applications==1.0.8 Keras-Preprocessing==1.0.9 kiwisolver==1.2.0 libarchive-c==2.9 llvmlite==0.31.0 lxml==4.3.3 Mako==1.0.7 Markdown==2.6.11 MarkupSafe==1.1.1 matplotlib==3.1.1 mkl-fft==1.1.0 mkl-random==1.1.0 mkl-service==2.3.0 mock==3.0.5 mypy==0.770 mypy-extensions==0.4.3 networkx==2.4 numba==0.48.0 numpy==1.18.1 olefile==0.46 onnx==1.6.0 opcua @ file:///home/conda/feedstock_root/build_artifacts/opcua_1588501975412/work pandas==1.0.3 parso==0.3.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==6.1.0 pip==20.1 plyfile==0.7.1 prompt-toolkit==3.0.5 protobuf==3.9.2 psutil==5.7.0 ptyprocess==0.6.0 pyasn1==0.4.4 PyAudio==0.2.11 pycairo==1.18.0 pycosat==0.6.3 pycparser==2.19 pydot==1.4.1 Pygments==2.2.0 pygpu==0.7.6 pyOpenSSL==19.1.0 pyparsing==2.3.0 PySocks==1.7.1 python-dateutil==2.7.5 pytz==2018.7 PyWavelets==1.1.1 PyYAML==5.1.2 pyzmq==18.0.1 requests==2.23.0 rsa==3.4.2 ruamel-yaml==0.15.71 scikit-image==0.16.2 scipy==1.4.1 setuptools==46.1.3.post20200325 simplegeneric==0.8.1 six==1.14.0 snakeviz==2.0.1 tensorboard==1.14.0 tensorflow-estimator==1.13.0 termcolor==1.1.0 Theano==1.0.3 toolz==0.10.0 torch==1.4.0 tornado==6.0.4 tqdm==4.32.2 traitlets==4.3.3 typed-ast==1.4.1 typing-extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1588470653596/work urllib3==1.25.9 vispy==0.6.4 wcwidth==0.1.7 Werkzeug==0.14.1 wheel==0.34.2 yapf==0.24.0 ``````