Open Inquisitive-ME opened 1 year ago
same observation on my setup: clean install of Ubuntu 22.04.2 tensorboard 2.12.2 tensorflow & CUDA installation according to the official tensorflow instructions: https://www.tensorflow.org/install/pip?hl=en#linux
setting PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python does not help either: W0428 20:31:10.595216 139654082852416 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline'
No profile data was found.
same observation on my setup: clean install of Ubuntu 22.04.2 tensorboard 2.12.2 tensorflow & CUDA installation according to the official tensorflow instructions: https://www.tensorflow.org/install/pip?hl=en#linux
setting PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python does not help either: W0428 20:31:10.595216 139654082852416 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline'
No profile data was found.
I hav the same issue!
Ok, I think I found the rootcause of the problem. It is caused by a bug in the Bazel configuration files. All profiler protobuf stubs are generated using the ancient protobuf package ( 3.8.0 ). Which makes them incompatible with protobuf stubs from tensorboad/tensorflow as they are generated with the newer protobuf package >= 3.19.6. Tensorboard has an explicit dependency to load protobuf 3.19.6 for stub generation. Such a dependency is missing in the Bazel configuration for the profiler - instead it has a dependency on tensorflow 2.1.0 where protobuf 3.8.0 is loaded: in tensorflow/workspace.bzl
PROTOBUF_URLS = [
"https://storage.googleapis.com/mirror.tensorflow.org/github.com/protocolbuffers/protobuf/archive/310ba5ee72661c081129eb878c1bbcec936b20f0.tar.gz",
"https://github.com/protocolbuffers/protobuf/archive/310ba5ee72661c081129eb878c1bbcec936b20f0.tar.gz",
]
PROTOBUF_SHA256 = "b9e92f9af8819bbbc514e2902aec860415b70209f31dfc8c4fa72515a5df9d59"
PROTOBUF_STRIP_PREFIX = "protobuf-310ba5ee72661c081129eb878c1bbcec936b20f0"
this makes tensorflow profiler incompatible with all tensorboard/tensorflow releases based on protobuf >= 3.19.0
https://github.com/tensorflow/profiler/pull/636 Fixes this isuse, You can verify that the change works by downloading tbp-nightly.
Thanks, It looks like the fix solves the problem with protobuf compatibility. However still I cannot see any profile data in the browser, the log from tensorboard/profiler shows the followng:
NOTE: Using experimental fast data loading logic. To disable, pass "--load_fast=false" and report issues on GitHub. More details: https://github.com/tensorflow/tensorboard/issues/4784
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.14.0a20230604 at http://localhost:6006/ (Press CTRL+C to quit) W0608 18:27:59.310396 140681651652160 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline' W0608 19:31:01.143541 140681441916480 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline' W0608 19:33:49.505171 140681358022208 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline' W0608 19:35:26.398867 140681358022208 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline' W0608 19:35:32.885195 140681358022208 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline' W0608 19:48:56.000323 140681525810752 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline'
Hi,
Could you provide information regarding the version of the packages installed on your system?
I don't see a possible error condition within the logs provided. Do you see any errors within the browser console?.
Sure, I can recreate this problem with latest: tf-nightly - 2.14.0.dev20230609 tb-nightly - 2.14.0a20230609 tbp-nightly - 2.14.0a20230609
log from tensorboard: 2023-06-09 21:37:04.990730: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-06-09 21:37:05.429546: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
NOTE: Using experimental fast data loading logic. To disable, pass "--load_fast=false" and report issues on GitHub. More details: https://github.com/tensorflow/tensorboard/issues/4784
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.14.0a20230609 at http://localhost:6006/ (Press CTRL+C to quit) W0609 21:50:52.116963 140076514244160 security_validator.py:60] In 3.0, this warning will become an error: Illegal Content-Security-Policy for script-src: 'unsafe-inline' Illegal Content-Security-Policy for script-src-elem: 'unsafe-inline'
here is tf execution log from my app:
2023-06-09 21:41:26.653283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1637] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 19414 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-06-09 21:41:26.853218: I tensorflow/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
2023-06-09 21:41:26.853243: I tensorflow/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
2023-06-09 21:41:26.853269: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_tracer.cc:1671] Profiler found 1 GPUs
2023-06-09 21:41:26.859587: I tensorflow/tsl/profiler/lib/profiler_session.cc:131] Profiler session tear down.
2023-06-09 21:41:26.859639: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_tracer.cc:1805] CUPTI activity buffer flushed
Epoch 1/2
2023-06-09 21:41:28.795830: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:434] Loaded cuDNN version 8902
2023-06-09 21:41:29.981827: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:606] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-06-09 21:41:30.143119: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb678a86a50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-06-09 21:41:30.143143: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2023-06-09 21:41:30.155607: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY
to enable.
2023-06-09 21:41:30.276619: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
499/924 [===============>..............] - ETA: 1:55 - loss: 397827.8438 - accuracy: 0.29602023-06-09 21:43:46.324071: I tensorflow/tsl/profiler/lib/profiler_session.cc:104] Profiler session initializing.
2023-06-09 21:43:46.324092: I tensorflow/tsl/profiler/lib/profiler_session.cc:119] Profiler session started.
519/924 [===============>..............] - ETA: 1:49 - loss: 382497.2812 - accuracy: 0.29912023-06-09 21:43:52.028953: I tensorflow/tsl/profiler/lib/profiler_session.cc:70] Profiler session collecting data.
2023-06-09 21:43:52.030085: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_tracer.cc:1805] CUPTI activity buffer flushed
2023-06-09 21:43:52.047298: I tensorflow/compiler/xla/backends/profiler/gpu/cupti_collector.cc:541] GpuTracer has collected 2475 callback api events and 2450 activity events.
2023-06-09 21:43:52.061061: I tensorflow/tsl/profiler/lib/profiler_session.cc:131] Profiler session tear down.
2023-06-09 21:43:52.066437: I tensorflow/tsl/profiler/rpc/client/save_profile.cc:144] Collecting XSpace to repository: /home/jozef/logs/20230609-214008/plugins/profile/2023_06_09_21_43_52/jozef-desktop.xplane.pb
924/924 [==============================] - 256s 273ms/step - loss: 1571177.1250 - accuracy: 0.3012
Epoch 2/2
924/924 [==============================] - 253s 274ms/step - loss: 28949560.0000 - accuracy: 0.2877
this is the list of files created in log directory during the program execution: ./plugins/profile/2023_06_09_21_43_52/jozef-desktop.xplane.pb ./train/events.out.tfevents.1686339687.jozef-desktop.6744.0.v2
in the browser - in the profiler tab the message "No profile data was found." appears
What is the logdir specified when starting tensorboard?
tensorboard --logdir ~/logs
Seems like an issue with the logdir path, It should be /home/jozef/logs/20230609-214008
. The tensorflow execution is receiving that as the logdir for the profiling request.
You could try running tensorboard --logdir /home/jozef/logs/20230609-214008
Wow, with tensorboard --logdir /home/jozef/logs/20230609-214008 it works like a charm. Thanks for this workaroud. :+1: So, it looks like there is an issue with handling the logdir parameter. Tensorboard shows properly all the collected profile runs in tensorboard browser interface, however selecting specific one via tensorboard web interface is not working properly right now. To make it work path to specific profiler run must be provided as input parameter to tensorboard, right? And tensorboard must be restarted with new logdir parameter everytime new profile data is collected. ?
Seems like an issue with the logdir path, It should be
/home/jozef/logs/20230609-214008
. The tensorflow execution is receiving that as the logdir for the profiling request.You could try running
tensorboard --logdir /home/jozef/logs/20230609-214008
@cliveverghese I think this is indicative of a broader issue starting in 2.12 (possibly earlier) where the location of the profile data has changed. This is breaking profiler in tensorboard. When I manually copy the files to the right place, tensorboard profiler works as expected.
Using what is available as the latest versions from pip I get the following error
E0421 08:52:53.103637 140219640803328 application.py:125] Failed to load plugin ProfilePluginLoader.load; ignoring it. Traceback (most recent call last): File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/tensorboard/backend/application.py", line 123, in TensorBoardWSGIApp plugin = loader.load(context) File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/tensorboard_plugin_profile/profile_plugin_loader.py", line 75, in load from tensorboard_plugin_profile import profile_plugin File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/tensorboard_plugin_profile/profile_plugin.py", line 36, in
from tensorboard_plugin_profile.convert import raw_to_tool_data as convert
File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/tensorboard_plugin_profile/convert/raw_to_tool_data.py", line 29, in
from tensorboard_plugin_profile.convert import input_pipeline_proto_to_gviz
File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/tensorboard_plugin_profile/convert/input_pipeline_proto_to_gviz.py", line 28, in
from tensorboard_plugin_profile.protobuf import input_pipeline_pb2
File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/tensorboard_plugin_profile/protobuf/input_pipeline_pb2.py", line 17, in
from tensorboard_plugin_profile.protobuf import diagnostics_pb2 as plugin_dot_tensorboardpluginprofile_dot_protobuf_dot_diagnosticspb2
File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/tensorboard_plugin_profile/protobuf/diagnostics_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/home/richard/.virtualenvs/deep_learning/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 561, in new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
It seems like the profiler plugin is incompatible with the latest tensorflow and tensorboard.