Closed MeghnaNatraj closed 4 years ago
If i run it without the above command, i get this error:
%load_ext tensorboard
%tensorboard --logdir {LOGS_DIR}
ERROR: Failed to launch TensorBoard (exited with 1).
Contents of stderr:
Traceback (most recent call last):
File "/usr/local/bin/tensorboard", line 8, in <module>
sys.exit(run_main())
File "/usr/local/lib/python3.6/dist-packages/tensorboard/main.py", line 64, in run_main
app.run(tensorboard.main, flags_parser=tensorboard.configure)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 220, in main
server = self._make_server()
File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 301, in _make_server
self.assets_zip_provider)
File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 150, in standard_tensorboard_wsgi
flags, plugin_loaders, data_provider, assets_zip_provider, multiplexer)
File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 202, in TensorBoardWSGIApp
return TensorBoardWSGI(tbplugins, flags.path_prefix)
File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 254, in __init__
raise ValueError('Duplicate plugins for name %s' % plugin.plugin_name)
ValueError: Duplicate plugins for name whatif
This is because of the new package tensorboard-plugin-wit
released in Feb 2020. It's causing issues to many people here (https://github.com/pytorch/pytorch/issues/22676), and there can be other updates in the future as well.
You can run the following command to find all tensorboard packages installed in the Colab environment:
! pip list --format=freeze | grep tensorboard
tensorboard==1.15.0
tensorboard-plugin-wit==1.6.0.post2 # causes the issue
tensorboardcolab==0.0.22
Hi @MeghnaNatraj! I can reproduce this error by running
%tensorflow_version 1.x
%load_ext tensorboard
%tensorboard --logdir logs
in a blank notebook with a fresh Colab runtime.
Could you please point us to an example notebook that runs into this
problem? I looked at a few TF Lite Colabs (flowers_tf_lite.ipynb
,
text_classification.ipynb
, image_classification.ipynb
) but didn’t
find any that used TensorBoard. It would be great to verify that the
fixes that we put in actually work for your use case.
It looks like the problem is that %tensorflow_version 1.x
adds an
entry to the Python path for TF 1.x, which suffices for new or
superseding versions of packages, but doesn’t suffice to remove packages
that must not exist in 1.x, like tensorboard_plugin_wit
. I’ll see if
we can fix this on the Colab side, and failing that I’ll look into
whether we might want to backport a patch to 1.15.
Actually, the simplest fix would be to update the notebooks in question
to use TensorFlow 2.x, which just doesn’t have this problem because no
path manipulation is required. It doesn’t look to me like there are any
tutorials in tensorflow
, docs
, or tensorboard
that use both
%tensorflow_version 1.x
and %tensorboard
. Is upgrading the tutorials
an option, now that TensorFlow 2.x has been out for about half a year?
The notebook I'm referring to is this one: https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train_speech_model.ipynb
Let me know if you face an issue as well. The issue is that we need to use tensorflow==1.15
for a few more weeks/months. As a result of this I uninstall the default TensorFlow 2.x and install tensorflow1.15.
What do you suggest I do? Use this workaround until we move to TF2.x?
Okay, it looks like that notebook has a lot of custom setup rather than
using %tensorflow_version 1.x
at all:
!pip uninstall -y tensorflow tensorflow_estimator tensorboard
!pip install -q tf-estimator-nightly==1.14.0.dev2019072901 tf-nightly-gpu==1.15.0.dev20190729
Given that this custom setup is already required, it seems reasonable to
also uninstall tensorboard-plugin-wit
as a workaround. That or
upgrading to TF 2.x are probably your best bets for now. I’ve opened an
internal bug with the Colab team (http://b/152986612; CCed you), but
it’s not clear whether this will ever be fixed since it only affects
non-clean installs of TF 1.x.
(FWIW, I can’t actually reproduce this error there; instead, I see an error “ModuleNotFoundError: No module named 'tensorboard'”, which suggests that the custom setup did not install TensorBoard correctly.)
@wchargin I haven't checked in my updates yet, but replace the cell you've pasted above, i.e, get rid of all those uninstall/install commands with the following code:
%tensorflow_version 1.x
However, as you've posted initially as well, we still get the same error.
Currently, only this code snippet works in order to use TF1.x:
# Remove all TensorBoard packages.
! pip list --format=freeze | grep tensorboard | xargs pip uninstall -y
# Install TensorFlow again (This command will only install the default TensorBoard package associated with this TensorFlow package).
! pip install -q tensorflow
If there can be a fix, it can help other users too! i had faced this issue before as well. But if not, i can use the workaround as well.
Ah, I understand now; thanks. Yes, it would definitely be nice if this could be fixed. I think it’s just a question of tradeoffs. It sounds like it would be a fair amount of work to fix this properly on either the Colab side or the TensorBoard side. For TensorBoard, at least, we occasionally push patch releases with bug fixes to the current release series, but I don’t think we’ve ever backported a change to an old version.
I’ll solicit opinions from the rest of the TensorBoard team to see what people think, and then get back to you here.
See tensorboard-plugin-wit mitigation in https://github.com/PAIR-code/what-if-tool/pull/64
tensorboard_plugin_wit-1.6.0.post3 has been uploaded to PyPi and includes a workaround for this issue.
@jameswex: Thank you for the quick fix and release! Confirmed that this works in Colab when the new package is installed:
I’ll ask the Colab team to update the base image so that it works by default.
Thank you so much for the fix! Is there an ETA on when tensorboard_plugin_wit-1.6.0.post3
would be available in the default colab environment?
I am still facing this issue. My tf version is 2.1. My pip list:
I tried to run !pip install tensorboard_plugin_wit
it produced the output:
Requirement already satisfied: tensorboard_plugin_wit in /usr/local/lib/python3.6/dist-packages (1.6.0.post2)
But, tensorboard is still throwing the error. How to fix or workaround this issue?
@MeghnaNatraj: This should roll out in the next few days. (The change has been submitted internally and just needs to be deployed.)
@arya46: You almost got it :-) !pip install -U tensorboard_plugin_wit
,
with -U
for “upgrade”.
@wchargin Thank you for pointing out my mistake. Yes, it did solve the issue.
@wchargin thank you so much for the update! :)
@wchargin That did it for me as well, after days of suffering ;) Was this a Google Colab update issue and (not sure if it is) if this problem was for all TF 1.0 users using Tensorboard (which would be quite a lot i can imagne) wouldn't it be fixed/reverted already? I heard you saying the patch will release soon?
@jandevries123: It’s an issue due to how the %tensorflow_version
magic
works, which is as follows:
The Colab images have TensorFlow 2.x installed to the default Python path, and also have TensorFlow 1.x installed under a separate directory that’s not on the default path.
When you run %tensorflow_version 1.x
, your PATH
, PYTHONPATH
,
and sys.path
are updated to prepend the 1.x directory. When you
run %tensorflow_version 2.x
, it’s popped off the path. Thus:
import os
print(os.environ["PATH"])
%tensorflow_version 1.x
print(os.environ["PATH"])
%tensorflow_version 2.x
print(os.environ["PATH"])
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin:/opt/bin
TensorFlow 1.x selected.
/tensorflow-1.15.2/python3.6/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin:/opt/bin
TensorFlow 2.x selected.
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin:/opt/bin
Consequently, in TensorFlow 1.x mode, actually both versions of TensorFlow are on the path, but the 1.x packages are earlier in the path, so when there is a conflict the 1.x packages take precedence, as desired.
This works fine as long as all you care about is changing the versions of installed packages in the two environments without changing which packages are installed. The problem occurs when there is a package installed in the 2.x environment that must not be available in the 1.x environment: prepending the 1.x directory to the path won’t actually remove such a package.
The tensorboard-plugin-wit==1.6.0post2
package falls into this
category. The tensorboard-plugin-wit==1.6.0post3
package does not
fall into this category: it’s compatible with both 1.x and 2.x. So what
we did was just update the Colab base image to use 1.6.0post3
instead
of 1.6.0post2
. The package will still be available in both
environments, but it won’t cause any problems.
Should be deployed in prod:
pip install -q tf-estimator-nightly==1.14.0.dev2019072901 tf-nightly-gpu==1.15.0.dev20190729
Hi wchargin, I try to install and reinstall tensorflow==1.15. but gives me an error like:
ERROR: Could not find a version that satisfies the requirement tf-nightly-gpu==1.15.0.dev20190729 (from versions: 2.4.0.dev20200903, 2.4.0.dev20200904, 2.4.0.dev20200905, 2.4.0.dev20200906, 2.4.0.dev20200907, 2.4.0.dev20200908, 2.4.0.dev20200911, 2.4.0.dev20200912, 2.4.0.dev20200913, 2.4.0.dev20200914, 2.4.0.dev20200915, 2.4.0.dev20200916, 2.4.0.dev20200917, 2.4.0.dev20200918, 2.4.0.dev20200919, 2.4.0.dev20200920, 2.4.0.dev20200921, 2.4.0.dev20200922, 2.4.0.dev20200923, 2.4.0.dev20200924, 2.4.0.dev20200925, 2.4.0.dev20200926, 2.4.0.dev20200927, 2.4.0.dev20200928, 2.4.0.dev20200929, 2.4.0.dev20200930, 2.4.0.dev20201001, 2.4.0.dev20201002, 2.4.0.dev20201003, 2.4.0.dev20201004, 2.4.0.dev20201005, 2.4.0.dev20201007, 2.4.0.dev20201008, 2.4.0.dev20201010, 2.4.0.dev20201011, 2.4.0.dev20201012, 2.4.0.dev20201014, 2.4.0.dev20201015, 2.4.0.dev20201016, 2.4.0.dev20201017, 2.4.0.dev20201018, 2.4.0.dev20201019, 2.4.0.dev20201020, 2.4.0.dev20201021, 2.4.0.dev20201022, 2.4.0.dev20201023, 2.5.0.dev20201024, 2.5.0.dev20201025, 2.5.0.dev20201026)
ERROR: No matching distribution found for tf-nightly-gpu==1.15.0.dev20190729
I am using Python 3.8
You’re trying to use a very old tf-nightly
, from more than a year ago,
with a version of Python that wasn’t even released at that time. That’s
why there’s no matching version.
You should no longer need a workaround. I just checked and this issue is still fixed in prod:
Every few months, the colab tutorials released by my team seem to break due to updates made to the Colab environment. The reason is due to multiple tensorboard versions being installed.
As a result of this, I run the following code snippet before running TensorBoard each time:
Seems like many users also face this issue often: https://github.com/pytorch/pytorch/issues/22676
Not sure if this is a Colab or a Tensorboard issue, but I'm posting it here.