Open ben-omji opened 2 years ago
Without being able to reproduce the case myself I can only think about couples of guess: The error occurs in this line when tf tried to load the check points
checkpoint
file, it contains model.ckpt-178
based on the error messagemodel.ckpt-178
exists in /ckpt/..: the last one I see is 'model.ckpt-0.data-00000-of-00001' from your log. If there is no such model name, probably the name is somewhere not set in your training script.write
https://www.tensorflow.org/io/tutorials/azure#read_and_write_files_to_azure_storage_with_tensorflow@japie1235813
Thank you for your comment.
As you said, I checked my checkpoint
file and it contains model.ckpt-178
as the model_checkpoint_path
.
Then I checked model.ckpt-178
exists in .../ckpt/. It's truly there but I just skipped the list of file after model.ckpt-0.data-00000-of-00001
in my log.
Actually, with AWS S3, I can get everything I expected on tensorboard. But with Azure blob storage, I can not get anyting although the codes are exactly same except storage access information.
Thanks for providing the information.
--verbosity=1
and pass over the log? It might reveal more information about where are not working in the process.Hi, this is the log after I added "az://"
to https://github.com/tensorflow/tensorboard/blob/master/tensorboard/util/io_util.py#L20 and build it with bazel build tensorboard:tensorboard
root@a601721015eb:~/tensorboard# ./bazel-bin/tensorboard/tensorboard --logdir az://rndstoragesample/containersample2/efficientdet-finetune/ckpt --bind_all --verbosity=1
I0404 01:20:52.332980 139840485025600 program.py:489] Note: --load_fast behavior only supports local and GCS (gs://) paths; falling back to slower Python-only load path.
I0404 01:20:52.333134 139840485025600 plugin_event_multiplexer.py:106] Event Multiplexer initializing.
I0404 01:20:52.333190 139840485025600 plugin_event_multiplexer.py:126] Event Multiplexer done initializing
I0404 01:20:52.362562 139840485025600 data_ingester.py:128] Launching reload in a daemon thread
I0404 01:20:52.363056 139838087120640 data_ingester.py:102] TensorBoard reload process beginning
I0404 01:20:52.363506 139838087120640 plugin_event_multiplexer.py:203] Starting AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
TensorBoard 2.9.0a0 at http://a601721015eb:6006/ (Press CTRL+C to quit)
I0404 01:20:57.545112 139838087120640 plugin_event_multiplexer.py:209] Done with AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:20:57.545384 139838087120640 data_ingester.py:105] TensorBoard reload process: Reload the whole Multiplexer
I0404 01:20:57.545490 139838087120640 plugin_event_multiplexer.py:214] Beginning EventMultiplexer.Reload()
I0404 01:20:57.545643 139838087120640 plugin_event_multiplexer.py:257] Reloading runs serially (one after another) on the main thread.
I0404 01:20:57.545749 139838087120640 plugin_event_multiplexer.py:267] Finished with EventMultiplexer.Reload()
I0404 01:20:57.545852 139838087120640 data_ingester.py:110] TensorBoard done reloading. Load took 5.183 secs
I0404 01:21:02.551295 139838087120640 data_ingester.py:102] TensorBoard reload process beginning
I0404 01:21:02.551645 139838087120640 plugin_event_multiplexer.py:203] Starting AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:07.626314 139838087120640 plugin_event_multiplexer.py:209] Done with AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:07.626531 139838087120640 data_ingester.py:105] TensorBoard reload process: Reload the whole Multiplexer
I0404 01:21:07.626597 139838087120640 plugin_event_multiplexer.py:214] Beginning EventMultiplexer.Reload()
I0404 01:21:07.626689 139838087120640 plugin_event_multiplexer.py:257] Reloading runs serially (one after another) on the main thread.
I0404 01:21:07.626765 139838087120640 plugin_event_multiplexer.py:267] Finished with EventMultiplexer.Reload()
I0404 01:21:07.626820 139838087120640 data_ingester.py:110] TensorBoard done reloading. Load took 5.076 secs
I0404 01:21:10.935636 139838078727936 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:10] "GET / HTTP/1.1" 200 -
I0404 01:21:10.965151 139838078727936 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:10] "GET /index.js?_file_hash=07fcc25b HTTP/1.1" 200 -
I0404 01:21:10.968884 139838070335232 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:10] "GET /font-roboto/oMMgfZMQthOryQo9n22dcuvvDin1pK8aKteLpeZ5c0A.woff2 HTTP/1.1" 200 -
I0404 01:21:11.520380 139838078727936 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /icon_bundle.svg HTTP/1.1" 200 -
I0404 01:21:11.554227 139838078727936 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /font-roboto/RxZJdnzeo3R5zSexge8UUZBw1xU1rKptJj_0jans920.woff2 HTTP/1.1" 200 -
I0404 01:21:11.561268 139838070335232 application.py:435] Plugin listing: is_active() for scalars took 0.000 seconds
I0404 01:21:11.561539 139838070335232 application.py:435] Plugin listing: is_active() for custom_scalars took 0.000 seconds
I0404 01:21:11.561725 139838070335232 application.py:435] Plugin listing: is_active() for images took 0.000 seconds
I0404 01:21:11.561880 139838070335232 application.py:435] Plugin listing: is_active() for audio took 0.000 seconds
I0404 01:21:11.562028 139838070335232 application.py:435] Plugin listing: is_active() for debugger-v2 took 0.000 seconds
I0404 01:21:11.562177 139838070335232 application.py:435] Plugin listing: is_active() for graphs took 0.000 seconds
I0404 01:21:11.562328 139838070335232 application.py:435] Plugin listing: is_active() for distributions took 0.000 seconds
I0404 01:21:11.562465 139838070335232 application.py:435] Plugin listing: is_active() for histograms took 0.000 seconds
I0404 01:21:11.562606 139838070335232 application.py:435] Plugin listing: is_active() for text took 0.000 seconds
I0404 01:21:11.562738 139838070335232 application.py:435] Plugin listing: is_active() for pr_curves took 0.000 seconds
I0404 01:21:11.562874 139838070335232 application.py:435] Plugin listing: is_active() for profile_redirect took 0.000 seconds
I0404 01:21:11.563013 139838070335232 application.py:435] Plugin listing: is_active() for hparams took 0.000 seconds
I0404 01:21:11.563209 139838070335232 application.py:435] Plugin listing: is_active() for mesh took 0.000 seconds
I0404 01:21:11.563377 139838070335232 application.py:435] Plugin listing: is_active() for timeseries took 0.000 seconds
I0404 01:21:11.564099 139838070335232 application.py:435] Plugin listing: is_active() for projector took 0.001 seconds
I0404 01:21:11.564291 139838070335232 application.py:435] Plugin listing: is_active() for whatif took 0.000 seconds
I0404 01:21:11.566493 139838070335232 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /data/plugins_listing HTTP/1.1" 200 -
I0404 01:21:11.569489 139838053549824 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /data/environment HTTP/1.1" 200 -
I0404 01:21:11.571566 139838045157120 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /data/runs HTTP/1.1" 200 -
I0404 01:21:11.621406 139838045157120 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /data/runs HTTP/1.1" 200 -
I0404 01:21:11.622396 139838070335232 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /data/environment HTTP/1.1" 200 -
I0404 01:21:11.735285 139838045157120 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /font-roboto/d-6IYplOFocCacKzxwXSOJBw1xU1rKptJj_0jans920.woff2 HTTP/1.1" 200 -
I0404 01:21:11.736663 139838070335232 _internal.py:225] ::ffff:192.168.15.87 - - [04/Apr/2022 01:21:11] "GET /font-roboto/vPcynSL0qHq_6dX7lKVByXYhjbSpvc47ee6xR_80Hnw.woff2 HTTP/1.1" 200 -
I0404 01:21:12.631995 139838087120640 data_ingester.py:102] TensorBoard reload process beginning
I0404 01:21:12.632247 139838087120640 plugin_event_multiplexer.py:203] Starting AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:17.706724 139838087120640 plugin_event_multiplexer.py:209] Done with AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:17.707039 139838087120640 data_ingester.py:105] TensorBoard reload process: Reload the whole Multiplexer
I0404 01:21:17.707193 139838087120640 plugin_event_multiplexer.py:214] Beginning EventMultiplexer.Reload()
I0404 01:21:17.707406 139838087120640 plugin_event_multiplexer.py:257] Reloading runs serially (one after another) on the main thread.
I0404 01:21:17.707566 139838087120640 plugin_event_multiplexer.py:267] Finished with EventMultiplexer.Reload()
I0404 01:21:17.707705 139838087120640 data_ingester.py:110] TensorBoard done reloading. Load took 5.076 secs
I0404 01:21:22.712980 139838087120640 data_ingester.py:102] TensorBoard reload process beginning
I0404 01:21:22.713310 139838087120640 plugin_event_multiplexer.py:203] Starting AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:22.720707 139838087120640 plugin_event_multiplexer.py:209] Done with AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:22.720935 139838087120640 data_ingester.py:105] TensorBoard reload process: Reload the whole Multiplexer
I0404 01:21:22.721066 139838087120640 plugin_event_multiplexer.py:214] Beginning EventMultiplexer.Reload()
I0404 01:21:22.721239 139838087120640 plugin_event_multiplexer.py:257] Reloading runs serially (one after another) on the main thread.
I0404 01:21:22.721373 139838087120640 plugin_event_multiplexer.py:267] Finished with EventMultiplexer.Reload()
I0404 01:21:22.721486 139838087120640 data_ingester.py:110] TensorBoard done reloading. Load took 0.009 secs
W0404 01:21:26.998456 139838061942528 projector_plugin.py:489] Failed reading "az://rndstoragesample/containersample2/efficientdet-finetune/ckpt/model.ckpt-1607"
I0404 01:21:27.726734 139838087120640 data_ingester.py:102] TensorBoard reload process beginning
I0404 01:21:27.727046 139838087120640 plugin_event_multiplexer.py:203] Starting AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:32.892737 139838087120640 plugin_event_multiplexer.py:209] Done with AddRunsFromDirectory: az://rndstoragesample/containersample2/efficientdet-finetune/ckpt
I0404 01:21:32.892977 139838087120640 data_ingester.py:105] TensorBoard reload process: Reload the whole Multiplexer
I0404 01:21:32.893059 139838087120640 plugin_event_multiplexer.py:214] Beginning EventMultiplexer.Reload()
I0404 01:21:32.893168 139838087120640 plugin_event_multiplexer.py:257] Reloading runs serially (one after another) on the main thread.
I0404 01:21:32.893251 139838087120640 plugin_event_multiplexer.py:267] Finished with EventMultiplexer.Reload()
I0404 01:21:32.893320 139838087120640 data_ingester.py:110] TensorBoard done reloading. Load took 5.167 secs
Unfortunately, it doesn't work for me. I also have tried to find some unexpected result in source code related with this function but I could not find any suspicious thing..
And We currently moved to AWS because of the urgency of this project. However if you have any more idea about this issue, I'll try it for solving this issue.
Environment information (required)
Diagnostics
Diagnostics output
`````` --- check: autoidentify INFO: diagnose_tensorboard.py version e43767ef2b648d0d5d57c00f38ccbd38390e38da --- check: general INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0) INFO: os.name: posix INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='a601721015eb', release='5.4.0-74-generic', version='#83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021', machine='x86_64') INFO: sys.getwindowsversion(): N/A --- check: package_management INFO: has conda-meta: False INFO: $VIRTUAL_ENV: None --- check: installed_packages INFO: installed: tensorboard==2.8.0 WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview'] INFO: installed: tf-estimator-nightly==2.8.0.dev2021122109 INFO: installed: tensorboard-data-server==0.6.1 --- check: tensorboard_python_version INFO: tensorboard.version.VERSION: '2.8.0' --- check: tensorflow_python_version INFO: tensorflow.__version__: '2.8.0' INFO: tensorflow.__git_version__: 'v2.8.0-rc1-32-g3f878cff5b6' --- check: tensorboard_data_server_version INFO: data server binary: '/usr/local/lib/python3.8/dist-packages/tensorboard_data_server/bin/server' INFO: data server binary version: b'rustboard 0.6.1' --- check: tensorboard_binary_path INFO: which tensorboard: b'/usr/local/bin/tensorboard\n' --- check: addrinfos socket.has_ipv6 = True socket.AF_UNSPEC =Issue description
I can get list of model ckeckpoint directory with tensorflow gfile and tensorflow-io.
But When I try to set the Azure blob storage path to logdir of tensorboard, I always get "No dashboards are active for the current data set."
Reproduction steps
I downloaded tensorflow-io with pip and set accesskey on env var
Then I check the connection with storage through python script.
The connection is looks good, so I try to log the blob directory with tensorboard.
I got the log below, and face to "No dashboards are active for the current data set." page.
Do you guys have any idea for this problem?
Thanks in advance.