Open miclegr opened 2 years ago
I'm experiencing a similar issue.
In my case, tensorboard is running in a k8s pod for profiling tfserving.
Tensorboard is run with the following command:
tensorboard --host 0.0.0.0 --load_fast=false --logdir=[my_gcs_bucket]
After clicking "Capture" from the tensorboard UI and sending requests to the TFServer, the Profile
page doesn't show the profile results; it's as if the capture was never run. I verified that the gcs bucket has the xplane.pb trace files.
However, if I run tensorboard locally from my laptop pointing it to the gcs bucket, tensorboard locally does show the profile:
tensorboard --logdir=[my_gcs_bucket] --load_fast=false
Tensorboard version is 2.8.0, but the same issue occurs with version 2.4.1.
The issue occurs both with --load_fast=false
and without that flag (default set to true).
Installed the latest version of tensorboard_plugin_profile: tensorboard_plugin_profile-2.5.0-py3-none-any.whl
Any fix or debugging tips would be greatly appreciated. Thank you.
Any input on this? We are setting up tensorboard in a large-scale k8s deployment (>1000 pods), and so being able to store event logs in GCS is crucial for enabling this.
I can reproduce the issue locally in docker with latest serving and tensorflow images and Tensorboard 2.7.0, and am happy to send my docker files it if helps. A local docker container runs tensorboard specifying a log directory in GCS. Tried running tensorboard both with and without the --load_fast
option enabled, but still nothing appears in the Profile page (or any another page), after a profile capture.
Below is a list of files in GCS produced after a profiling run. Noticed a profile-empty file in the list:
gs://[...]/tensorboard/events.out.tfevents.1643293809.9c7022a74960.profile-empty
gs://[...]/tensorboard/plugins/profile/2022_01_27_14_30_08/tfserving_8500.xplane.pb
The file sizes are: events.out.tfevents.1643293809.9c7022a74960.profile-empty: 40B tfserving_8500.xplane.pb: 8.8MB
Here is the output of tensorboard inspect. Strangely, there are tags but no stats shown for each tag:
Found event files in:
gs://etsy-recsys-ml-dev-data-nxsn/user/dkondo/tensorboard
These tags are in gs://etsy-recsys-ml-dev-data-nxsn/user/dkondo/tensorboard:
audio -
histograms -
images -
scalars -
tensor -
======================================================================
Event statistics for gs://etsy-recsys-ml-dev-data-nxsn/user/dkondo/tensorboard:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
The docker container has access to the GCS bucket. I verified this by exec'ing into the container and using gsutil to list and read files in the bucket. Also, tensorboard inspect works in the bucket.
If I start a separate tensorboard instance from the command line from my laptop pointing to the same gcs bucket as so:
tensorboard --logdir gs://[...]/tensorboard --port 6007 --load_fast false
the profile results appear [after a log reload, clicked in the upper right hand corner of the UI].
I found that the issue specifically occurs with tensorboard-plugin-profile==2.5.0 with tensorboard 2.4.1 and 2.7.0 (and possibly other versions), but does not occur with version tensorboard-plugin-profile==2.4.0.
I'm running the keras profing notebook on colab and all works fine. Then I add a cell for logging into gcloud
and amend logging path to a gcs path:
and most of the times it works fine, but a few time I've got the "No profile data was found." page when browsing into tensorboard, even after refreshing.
Then I launch a tensorboard session in my local machine with logdir the gcs path:
and I always get the "No profile data was found." page when browsing into tensorboard, even after refreshing.
Finally I download the logging data from gcs bucket into a directory in my local machine and I start tensorboard with logdir my local path and it always shows the profile data.
Similar to #330 , but not quite like.
tensorboard 2.7.0, tensorboard_profiler_plugin 2.5.0