tensorflow / tensorboard

TensorFlow's Visualization Toolkit
Apache License 2.0
6.71k stars 1.66k forks source link

tb shows blank screen instead of report. Console inspection = failed to load resource: net::ERR_CONTENT_LENGTH_MISMATCH #6789

Open lessw2020 opened 7 months ago

lessw2020 commented 7 months ago

Hi - We're trying to consolidate on using tensorboard but sporadically hitting an issue where reports won't load and instead just get a blank white screen. This occurs on multiple machines, and at least where run, diagnose_tensorboard does not report any issues (full details below). Have upgraded and running latest tensorboard (2.16.2), and the tb data is all there (verified using --inspect option). Have tried a whole bunch of things (uninstall/reinstall, restart, chrome vs safari, --bind_all, etc.) and no luck.

Most promising lead so far is inspecting things in chrome reveals: failed to load resource: net::ERR_CONTENT_LENGTH_MISMATCH

Is there any input or advise here on how to further debug/resolve? (chrome screenshot of error below).
Thanks in advance!

Screenshot 2024-03-15 at 8 51 19 AM

Results of diagnose_tensorboard below: python diagnose_tensorboard.py

Diagnostics

Diagnostics output `````` --- check: autoidentify INFO: diagnose_tensorboard.py version df7af2c6fc0e4c4a5b47aeae078bc7ad95777ffa --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=9, micro=12, releaselevel='final', serial=0) INFO: os.name: posix INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='submit-1', release='5.15.0-1048-aws', version='#53~20.04.1-Ubuntu SMP Wed Oct 4 16:44:20 UTC 2023', machine='x86_64') INFO: sys.getwindowsversion(): N/A --- check: package_management INFO: has conda-meta: True INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==2.16.2 WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview'] WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly'] INFO: installed: tensorboard-data-server==0.7.2 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '2.16.2' --- check: tensorflow_python_version Traceback (most recent call last): File "/opt/hpcaas/.mounts/fs-5c62ddab/home/less/torchtrain/outputs/tb/diagnose_tensorboard.py", line 511, in main suggestions.extend(check()) File "/opt/hpcaas/.mounts/fs-5c62ddab/home/less/torchtrain/outputs/tb/diagnose_tensorboard.py", line 81, in wrapper result = fn() File "/opt/hpcaas/.mounts/fs-5c62ddab/home/less/torchtrain/outputs/tb/diagnose_tensorboard.py", line 267, in tensorflow_python_version import tensorflow as tf ModuleNotFoundError: No module named 'tensorflow' --- check: tensorboard_data_server_version INFO: data server binary: '/data/home/less/miniconda3/lib/python3.9/site-packages/tensorboard_data_server/bin/server' INFO: data server binary version: b'rustboard 0.7.2' --- check: tensorboard_binary_path INFO: which tensorboard: b'/data/home/less/miniconda3/bin/tensorboard\n' --- check: addrinfos socket.has_ipv6 = True socket.AF_UNSPEC = socket.SOCK_STREAM = socket.AI_ADDRCONFIG = socket.AI_PASSIVE = Loopback flags: Loopback infos: [(, , 6, '', ('::1', 0, 0, 0)), (, , 6, '', ('127.0.0.1', 0))] Wildcard flags: Wildcard infos: [(, , 6, '', ('0.0.0.0', 0)), (, , 6, '', ('::', 0, 0, 0))] --- check: readable_fqdn INFO: socket.getfqdn(): 'submit-1.pytorch.hpcaas' --- check: stat_tensorboardinfo INFO: directory: /tmp/.tensorboard-info INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=9986583, st_dev=66305, st_nlink=2, st_uid=2232, st_gid=2232, st_size=4096, st_atime=1709866713, st_mtime=1710516966, st_ctime=1710516966) INFO: mode: 0o40777 --- check: source_trees_without_genfiles INFO: tensorboard_roots (1): ['/data/home/less/miniconda3/lib/python3.9/site-packages']; bad_roots (0): [] --- check: full_pip_freeze INFO: pip freeze --all: absl-py==2.1.0 accelerate==0.22.0 addict==2.4.0 -e git+https://github.com/lessw2020/OLMo.git@8877cf4c1522b51a1257b2462fc05942270f4f7c#egg=ai2_olmo aiohttp==3.9.1 aiosignal==1.3.1 alabaster==0.7.13 alembic==1.13.0 antlr4-python3-runtime==4.9.3 anyascii==0.3.2 anyio==3.7.1 appdirs==1.4.4 args==0.1.0 asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1670263926556/work async-timeout==4.0.3 attrs==23.1.0 auditnlg==0.0.1 auto-gptq @ file:///opt/hpcaas/.mounts/fs-5c62ddab/home/less/AutoGPTQ_Triton Automat==22.10.0 Babel==2.13.0 backcall @ file:///home/conda/feedstock_root/build_artifacts/backcall_1592338393461/work backports.functools-lru-cache @ file:///home/conda/feedstock_root/build_artifacts/backports.functools_lru_cache_1687772187254/work beaker-gantry==0.21.0 beaker-py==1.25.0 beartype==0.15.0 beautifulsoup4==4.12.2 bitsandbytes==0.39.1 black==23.12.1 blinker==1.7.0 blis==0.7.11 blobfile==2.0.2 boltons==23.1.1 boto3==1.34.48 botocore==1.34.48 Brotli==1.0.9 brotlipy==0.7.0 build==1.0.3 buildtools==1.0.6 cached_path==1.6.0 cachetools==5.3.1 catalogue==2.0.10 causal-conv1d==1.1.1 certifi==2023.11.17 cffi @ file:///opt/conda/conda-bld/cffi_1642701102775/work cfgv==3.4.0 charset-normalizer==3.3.2 click==8.1.7 click-help-colors==0.9.4 clint==0.5.1 cloudpathlib==0.15.1 cloudpickle==3.0.0 cmake==3.27.6 colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work coloredlogs==15.0.1 CoLT5-attention==0.10.15 conda==4.14.0 conda-content-trust @ file:///tmp/build/80754af9/conda-content-trust_1617045594566/work conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1649105784853/work confection==0.1.3 constantly==23.10.4 contourpy==1.2.0 contractions==0.1.73 coverage==7.3.1 cryptography @ file:///tmp/build/80754af9/cryptography_1639414572950/work cycler==0.12.1 cymem==2.0.8 dadaptation==3.1 DALL-E==0.1 databricks-cli==0.18.0 dataclasses-json==0.6.1 datasets==2.18.0 decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work detoxify==0.5.0 dill==0.3.7 distlib==0.3.7 docformatter==1.5.1 docker==6.1.3 docker-pycreds==0.4.0 docopt==0.6.2 docutils==0.15.2 edlib==1.3.9 einops==0.7.0 emoji==2.8.0 entrypoints @ file:///home/conda/feedstock_root/build_artifacts/entrypoints_1643888246732/work exceptiongroup==1.1.3 executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1667317341051/work face==20.1.1 fairscale==0.4.13 fastapi==0.103.2 fastjsonschema==2.19.0 filelock==3.13.1 fire==0.5.0 flake8==4.0.1 flake8-bugbear==22.4.25 flake8-polyfill==1.0.2 Flask==3.0.0 fonttools==4.46.0 frozenlist==1.4.0 fsspec==2023.10.0 ftfy==6.1.3 furl==2.1.3 fuzzywuzzy==0.18.0 gdown==4.7.1 gekko==1.0.6 gitdb==4.0.11 gitdb2==4.0.2 GitPython==3.1.40 glom==23.5.0 google-api-core==2.12.0 google-api-python-client==2.102.0 google-auth==2.23.2 google-auth-httplib2==0.1.1 google-cloud-core==2.3.3 google-cloud-storage==2.11.0 google-crc32c==1.5.0 google-resumable-media==2.6.0 googleapis-common-protos==1.60.0 greenlet==3.0.2 grpcio==1.60.1 gunicorn==21.2.0 h11==0.14.0 httplib2==0.22.0 httptools==0.6.0 huggingface-hub==0.19.4 humanfriendly==10.0 hydra-core==1.3.2 hyperlink==21.0.0 identify==2.5.29 idna==3.6 imagesize==1.4.1 importlib-metadata==7.0.0 importlib-resources==6.0.1 incremental==22.10.0 inflate64==0.3.1 iniconfig==2.0.0 inquirerpy==0.3.4 iopath==0.1.10 ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1620912942381/work/dist/ipykernel-5.5.5-py3-none-any.whl ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1685727741709/work ipython-genutils==0.2.0 isort==5.12.0 itsdangerous==2.1.2 jaraco.classes==3.3.1 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1690896916983/work jeepney==0.8.0 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.3.2 json5==0.9.14 jsonlines==4.0.0 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.17.3 jupyter-client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1633454794268/work jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1686775603087/work keyring==24.3.0 kiwisolver==1.4.5 langchain==0.0.309 langcodes==3.3.0 langsmith==0.0.42 libcst==1.0.1 lightning-utilities==0.10.1 lit==17.0.2 -e git+https://github.com/lessw2020/llama-recipes.git@38df368a70c292db6e42e0f275139678497fd896#egg=llama_recipes local-attention==1.8.6 loralib==0.1.2 -e git+https://github.com/thu-ml/low-bit-optimizers.git@e3e2854728e498c2a606e3fdb88daa27ae94f9a6#egg=lpmm lxml==4.9.3 Mako==1.3.0 mamba==0.11.3 Markdown==3.5.1 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 matplotlib==3.8.2 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1660814786464/work mccabe==0.6.1 mdurl==0.1.2 mlflow==2.9.1 mmcv==1.3.8 mmsegmentation==0.14.1 mock==5.1.0 more-itertools==10.2.0 moreorless==0.4.0 mpmath==1.3.0 msgpack==1.0.7 msgspec==0.18.6 multidict==6.0.4 multiprocess==0.70.15 multivolumefile==0.2.3 murmurhash==1.0.10 mypy==1.0.1 mypy-extensions==1.0.0 myst-parser==0.12.10 necessary==0.4.3 nest-asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1664684991461/work networkx==3.2.1 nh3==0.2.15 ninja==1.11.1.1 nltk==3.8.1 nodeenv==1.8.0 numpy==1.26.2 nvidia-cublas-cu11==11.10.3.66 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu11==8.5.0.96 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu11==10.9.0.58 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu11==10.2.10.91 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu11==11.7.4.91 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu11==2.14.3 nvidia-nccl-cu12==2.19.3 nvidia-nvjitlink-cu12==12.1.105 nvidia-nvtx-cu11==11.7.91 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.2 omegaconf==2.3.0 openai==0.28.1 opencv-python==4.8.1.78 optimum==1.12.0 orderedmultidict==1.0.1 packaging==23.2 pandas==2.1.3 parlai==1.7.0 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1638334955874/work pathspec==0.11.2 pathtools==0.1.2 pathy==0.10.2 peft @ git+https://github.com/huggingface/peft.git@85013987aa82aa1af3da1236b6902556ce3e483e pep8-naming==0.12.1 petname==2.6 pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1667297516076/work pfzy==0.3.4 pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work Pillow==10.1.0 pip==24.0 pkginfo==1.9.6 platformdirs==4.1.0 pluggy==1.3.0 portalocker==2.8.2 pre-commit==3.4.0 preshed==3.0.9 prettytable==3.9.0 prometheus-client==0.19.0 prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1688565951714/work protobuf==4.25.1 psutil==5.9.5 ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work py-gfm==2.0.0 py-rouge==1.1 py7zr==0.20.6 pyahocorasick==2.0.0 pyarrow==14.0.1 pyarrow-hotfix==0.6 pyasn1==0.5.0 pyasn1-modules==0.3.0 pybcj==1.0.1 pycodestyle==2.8.0 pycosat==0.6.3 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pycryptodomex==3.18.0 pydantic==1.10.13 pydot==1.4.2 pyflakes==2.4.0 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1691408637400/work PyJWT==2.8.0 pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work pyparsing==3.1.1 pyppmd==1.0.0 pyproject_hooks==1.0.0 pyrsistent==0.19.3 PySocks @ file:///tmp/build/80754af9/pysocks_1605305812635/work pytest==8.0.1 pytest-cov==4.1.0 pytest-datadir==1.5.0 pytest-mock==3.8.2 pytest-regressions==2.5.0 pytest-sphinx==0.6.0 python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work python-dotenv==1.0.0 python-hostlist==1.23.0 python-json-logger==2.0.7 pytorch-triton==3.0.0+a9bc1a3647 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.1 pyzstd==0.15.9 querystring-parser==1.2.4 rank-bm25==0.2.2 ray==2.7.0 readme-renderer==42.0 redo==2.0.4 regex==2023.10.3 requests==2.31.0 requests-mock==1.11.0 requests-toolbelt==1.0.0 requirements-parser==0.5.0 responses==0.18.0 rfc3339-validator==0.1.4 rfc3986==2.0.0 rfc3986-validator==0.1.1 rich==13.7.0 rouge==1.0.1 rpds-py==0.13.1 rsa==4.9 ruamel-yaml-conda @ file:///tmp/build/80754af9/ruamel_yaml_1616016711199/work ruff==0.1.7 s3transfer==0.10.0 safetensors==0.4.1 scikit-learn==1.3.2 scipy==1.11.4 SecretStorage==3.3.3 Send2Trash==1.8.2 sentencepiece==0.1.99 sentry-sdk==1.32.0 setproctitle==1.3.3 setuptools==69.0.2 sh==2.0.6 simplejson==3.19.2 six @ file:///tmp/build/80754af9/six_1644875935023/work smart-open==6.4.0 smashed==0.21.5 smmap==5.0.1 sniffio==1.3.0 snowballstemmer==2.2.0 soupsieve==2.5 spacy==3.7.1 spacy-legacy==3.0.12 spacy-loggers==1.0.5 Sphinx==2.2.2 sphinx-autodoc-typehints==1.10.3 sphinx-rtd-theme==1.3.0 sphinxcontrib-applehelp==1.0.4 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.1 sphinxcontrib-jquery==4.1 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 SQLAlchemy==2.0.23 sqlparse==0.4.4 srsly==2.4.8 st-moe-pytorch @ file:///data/home/less/st-moe stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work starlette==0.27.0 stdlibs==2022.10.9 subword-nmt==0.3.8 sympy==1.12 tabulate==0.9.0 tenacity==8.2.3 tensorboard==2.16.2 tensorboard-data-server==0.7.2 tensorboardX==2.6.2.2 termcolor==2.3.0 terminado==0.18.0 textsearch==0.0.24 texttable==1.6.7 tf-keras==2.15.0 thinc==8.2.1 threadpoolctl==3.2.0 tiktoken==0.5.2 timm==0.4.12 tinycss2==1.2.1 tokenize-rt==5.2.0 tokenizers==0.15.0 toml==0.10.2 tomli==2.0.1 tomlkit==0.12.1 toolz @ file:///home/conda/feedstock_root/build_artifacts/toolz_1657485559105/work torch==2.3.0.dev20240307+cu121 torchaudio==2.2.0.dev20240307+cu121 torchdata==0.6.1 torchinfo==1.8.0 torchmetrics==1.3.1 -e git+https://github.com/lessw2020/tau_graph.git@06b9eeddb971d39078b5774c2be8b67181562ca6#egg=torchpippy torchtext==0.15.2 torchvision==0.18.0.dev20240307+cu121 tornado==6.3.3 tqdm==4.66.1 trailrunner==1.4.0 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1675110562325/work transformers==4.37.2 triton==2.1.0 triton-nightly==2.1.0.post20231218080902 trouting==0.3.3 twine==5.0.0 Twisted==23.10.0 typer==0.9.0 types-python-dateutil==2.8.19.14 types-setuptools==69.1.0.20240223 typing-inspect==0.9.0 typing_extensions==4.8.0 tzdata==2023.3 ufmt==1.3.0 Unidecode==1.3.7 untokenize==0.1.1 uri-template==1.3.0 uritemplate==4.1.1 urllib3==1.26.18 usort==1.0.2 uvicorn==0.23.2 uvloop==0.17.0 virtualenv==20.24.5 vit-pytorch==1.4.1 vllm==0.2.0 wandb==0.16.3 wasabi==1.1.2 watchfiles==0.20.0 wcwidth==0.2.12 weasel==0.3.2 webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 websockets==11.0.3 Werkzeug==3.0.1 wheel==0.37.1 x-transformers==1.18.2 xformers==0.0.22 xxhash==3.4.1 yacs==0.1.8 yapf==0.40.2 yarl==1.9.4 zipp==3.17.0 zope.interface==6.1 ``````

Next steps

No action items identified. Please copy ALL of the above output, including the lines containing only backticks, into your GitHub issue or comment. Be sure to redact any sensitive information.

arcra commented 7 months ago

Hi, do you get just a blank page entirely in chrome, similar to what you see when you go to http://about:blank ? Does a refresh not solve it? Does the console show any error messages or warnings?

When this happens, can you clear cache on your browser and try again? (I've been bitten by this cache issue with other apps, perhaps not with this particular error, but I read somewhere that this could be a reason to get this error, and when it happened to me, I was pleasantly surprised to learn that just clearing the cache made it work).

I've also experienced slowness from TB reading the data occasionally, if there's too much data to load. It might just take some time, but I'd try the cache thing first.

Apart from that, I found this stackoverflow post where they claim that if you're accessing the app from a different device, it might be an issue with the "angular app" being too big, and having slow connection. (Likely not the case here, but I thought I'd mention it just in case it was relevant).

Please try clearing the cache and let us know if that solves the issue.

onevfall commented 6 months ago

I have the problem too, how did you solve it?