petuum / adaptdl

Resource-adaptive cluster scheduler for deep learning training.
https://adaptdl.readthedocs.io/
Apache License 2.0
425 stars 76 forks source link

Cann't access to tensorboard when mnist_tensorboard.py is running #137

Open xlcbingo1999 opened 1 year ago

xlcbingo1999 commented 1 year ago

I follow the guide in https://adaptdl.readthedocs.io/en/latest/commandline/tensorboard.html But when I tried adaptdl tensorboard proxy my-tensorboard -p 8080, I got a 408 Request Time-out in browser. The job is able to be finished successfully.

appendix: output of kubectl logs <tensorboard-pod-name>:

2023-01-09 12:13:26.332051: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

TensorBoard 2.11.0 at http://0.0.0.0:6006/ (Press CTRL+C to quit)

output of adaptdl tensorboard proxy my-tensorboard -p 8080:

Proxying to TensorBoard instance my-tensorboard at http://127.0.0.1:8080

output of adaptdl logs <tensorboard-adaptdljob-name>: nothing and hang