spyder-ide / spyder

Official repository for Spyder - The Scientific Python Development Environment
https://www.spyder-ide.org
MIT License
8.22k stars 1.59k forks source link

Restart kernel (Ipython Konsole in Spyder 5.1.5) with os._exit(0) does not work when using tensorflow.distribute.MirroredStrategy #16561

Open RomanFoell opened 2 years ago

RomanFoell commented 2 years ago

Problem Description

Hello,

the problem is described in detail in https://github.com/tensorflow/tensorflow/issues/52135.

I tried it on two Systems Windows Server and Windows 10, both with similar results as described in the link.

I also tried https://docs.spyder-ide.org/current/troubleshooting/common-illnesses.html and https://stackoverflow.com/questions/47267716/spyder-an-error-ocurred-while-starting-the-kernel.

I updated all relevant packages.

I am honest, I did not try reinstalling Anaconda.

But this was not the solution.

I also tried it in jupyter qtconsole with the following errors stats:

2021-10-08 19:44:12.649086: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2021-10-08 19:44:12.649289: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-10-08 19:44:12.656806: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: RT-Z0M6A
2021-10-08 19:44:12.660749: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: RT-Z0M6A
2021-10-08 19:44:12.661284: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

[JupyterQtConsoleApp] KernelRestarter: restarting kernel (1/5), keep random ports

[JupyterQtConsoleApp] WARNING | kernel restarted

[JupyterQtConsoleApp] WARNING | kernel died: 3.0015175342559814

After this the kernel restarted without problems.

The following informations are related to the System with Windows 10:

Versions

Dependencies


# Mandatory:
atomicwrites >=1.2.0          :  1.4.0 (OK)
chardet >=2.0.0               :  3.0.4 (OK)
cloudpickle >=0.5.0           :  1.6.0 (OK)
cookiecutter >=1.6.0          :  1.7.2 (OK)
diff_match_patch >=20181111   :  20200713 (OK)
intervaltree >=3.0.2          :  3.1.0 (OK)
IPython >=7.6.0               :  7.27.0 (OK)
jedi >=0.17.2;<0.19.0         :  0.18.0 (OK)
jsonschema >=3.2.0            :  3.2.0 (OK)
keyring >=17.0.0              :  22.3.0 (OK)
nbconvert >=4.0               :  5.5.0 (OK)
numpydoc >=0.6.0              :  1.1.0 (OK)
paramiko >=2.4.0              :  2.7.2 (OK)
parso >=0.7.0;<0.9.0          :  0.8.0 (OK)
pexpect >=4.4.0               :  4.8.0 (OK)
pickleshare >=0.4             :  0.7.5 (OK)
psutil >=5.3                  :  5.8.0 (OK)
pygments >=2.0                :  2.7.1 (OK)
pylint >=2.5.0;<2.10.0        :  2.9.6 (OK)
pyls_spyder >=0.4.0           :  0.4.0 (OK)
pylsp >=1.2.2;<1.3.0          :  1.2.3 (OK)
pylsp_black >=1.0.0           :  None (OK)
qdarkstyle =3.0.2             :  3.0.2 (OK)
qstylizer >=0.1.10            :  0.1.10 (OK)
qtawesome >=1.0.2             :  1.0.2 (OK)
qtconsole >=5.1.0             :  5.1.1 (OK)
qtpy >=1.5.0                  :  1.9.0 (OK)
rtree >=0.9.7                 :  0.9.7 (OK)
setuptools >=49.6.0           :  58.2.0 (OK)
sphinx >=0.6.6                :  3.2.1 (OK)
spyder_kernels >=2.1.1;<2.2.0 :  2.1.1 (OK)
textdistance >=4.2.0          :  4.2.1 (OK)
three_merge >=0.1.1           :  0.1.1 (OK)
watchdog >=0.10.3             :  2.1.3 (OK)
zmq >=17                      :  22.2.1 (OK)

# Optional:
cython >=0.21                 :  None (OK)
matplotlib >=2.0.0            :  3.4.3 (OK)
numpy >=1.7                   :  1.20.3 (OK)
pandas >=1.1.1                :  None (OK)
scipy >=0.17.0                :  1.7.1 (OK)
sympy >=0.7.3                 :  None (OK)
ccordoba12 commented 2 years ago

Hey @RomanFoell, thanks for reporting. Please describe the exact steps you used to create your conda environment with Tensorflow installed.

RomanFoell commented 2 years ago

Hi,

I could not reproduce the exact steps to generete the exact same environment (cant remember), but I could reproduce the error with the following steps:

I create a new env with: conda create --name test

Then i activate the env with: conda activate test

Then I install the newest version of python with: conda install python==3.9.7

Then I install tensorflow-gpu with: conda install tensorflow-gpu

Then I install Spyder with: conda install spyder

The package list after these steps is:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: win-64
_tflow_select=2.1.0=gpu
abseil-cpp=20210324.2=h0e60522_0
absl-py=0.14.1=pyhd8ed1ab_0
aiohttp=3.7.4.post0=py39hb82d6ee_0
alabaster=0.7.12=py_0
argh=0.26.2=pyh9f0ad1d_1002
arrow=1.2.0=pyhd8ed1ab_0
astor=0.8.1=pyh9f0ad1d_0
astroid=2.5.6=py39haa95532_1
astunparse=1.6.3=pyhd8ed1ab_0
async-timeout=3.0.1=py_1000
async_generator=1.10=py_0
atomicwrites=1.4.0=pyh9f0ad1d_0
attrs=21.2.0=pyhd8ed1ab_0
autopep8=1.5.7=pyhd8ed1ab_0
babel=2.9.1=pyh44b312d_0
backcall=0.2.0=pyh9f0ad1d_0
backports=1.0=py_2
backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
bcrypt=3.2.0=py39hb82d6ee_1
binaryornot=0.4.4=py_1
black=21.9b0=pyhd8ed1ab_0
bleach=4.1.0=pyhd8ed1ab_0
blinker=1.4=py_1
bosch-ca=1.0=1
brotlipy=0.7.0=py39hb82d6ee_1001
ca-certificates=2021.10.8=h5b45459_0
cached-property=1.5.2=hd8ed1ab_1
cached_property=1.5.2=pyha770c72_1
cachetools=4.2.4=pyhd8ed1ab_0
certifi=2021.10.8=py39hcbf5309_0
cffi=1.14.6=py39h0878f49_1
chardet=4.0.0=py39hcbf5309_1
click=8.0.2=py39hcbf5309_0
cloudpickle=2.0.0=pyhd8ed1ab_0
colorama=0.4.4=pyh9f0ad1d_0
cookiecutter=1.7.3=pyh6c4a22f_0
cryptography=3.4.7=py39hd8d06c1_0
cudatoolkit=11.3.1=h280eb24_9
cudnn=8.2.1.32=h754d62a_0
dataclasses=0.8=pyhc8e2a94_3
debugpy=1.4.1=py39h415ef7b_0
decorator=5.1.0=pyhd8ed1ab_0
defusedxml=0.7.1=pyhd8ed1ab_0
diff-match-patch=20200713=pyh9f0ad1d_0
docutils=0.17.1=py39hcbf5309_0
entrypoints=0.3=pyhd8ed1ab_1003
flake8=3.9.2=pyhd8ed1ab_0
flatbuffers=2.0.0=h0e60522_0
gast=0.4.0=pyh9f0ad1d_0
giflib=5.2.1=h8d14728_2
google-auth=1.35.0=pyh6c4a22f_0
google-auth-oauthlib=0.4.6=pyhd8ed1ab_0
google-pasta=0.2.0=pyh8c360ce_0
grpcio=1.38.1=py39hb76b349_0
h5py=3.4.0=nompi_py39hd4deaf1_101
hdf5=1.12.1=nompi_h2a0e4a3_100
icu=68.1=h0e60522_0
idna=2.10=pyh9f0ad1d_0
imagesize=1.2.0=py_0
importlib-metadata=4.8.1=py39hcbf5309_0
importlib_metadata=4.8.1=hd8ed1ab_0
inflection=0.5.1=pyh9f0ad1d_0
intel-openmp=2021.3.0=h57928b3_3372
intervaltree=3.0.2=py_0
ipykernel=6.4.1=py39h832f523_0
ipython=7.28.0=py39h832f523_0
ipython_genutils=0.2.0=py_1
isort=5.9.3=pyhd8ed1ab_0
jedi=0.18.0=py39hcbf5309_2
jinja2=2.11.3=pyh44b312d_0
jinja2-time=0.2.0=py_2
jpeg=9d=h8ffe710_0
jsonschema=4.0.1=pyhd8ed1ab_0
jupyter_client=6.1.12=pyhd8ed1ab_0
jupyter_core=4.8.1=py39hcbf5309_0
jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
keras-preprocessing=1.1.2=pyhd8ed1ab_0
keyring=23.2.1=py39hcbf5309_0
krb5=1.19.2=hbae68bd_2
lazy-object-proxy=1.6.0=py39hb82d6ee_0
libblas=3.9.0=11_win64_mkl
libcblas=3.9.0=11_win64_mkl
libclang=11.1.0=default_h5c34c98_1
libcurl=7.79.1=h789b8ee_1
liblapack=3.9.0=11_win64_mkl
libpng=1.6.37=h1d00b33_2
libprotobuf=3.14.0=h7755175_0
libsodium=1.0.18=h8d14728_1
libspatialindex=1.9.3=h39d44d4_4
libssh2=1.10.0=h680486a_2
m2w64-gcc-libgfortran=5.3.0=6
m2w64-gcc-libs=5.3.0=7
m2w64-gcc-libs-core=5.3.0=7
m2w64-gmp=6.1.0=2
m2w64-libwinpthread-git=5.0.0.4634.697f757=2
markdown=3.3.4=pyhd8ed1ab_0
markupsafe=1.1.1=py39h2bbff1b_0
matplotlib-inline=0.1.3=pyhd8ed1ab_0
mccabe=0.6.1=py_1
mistune=0.8.4=py39hb82d6ee_1004
mkl=2021.3.0=hb70f87d_564
msys2-conda-epoch=20160418=1
multidict=5.2.0=py39hb82d6ee_0
mypy_extensions=0.4.3=py39hcbf5309_3
nbclient=0.5.4=pyhd8ed1ab_0
nbconvert=6.2.0=py39hcbf5309_0
nbformat=5.1.3=pyhd8ed1ab_0
nest-asyncio=1.5.1=pyhd8ed1ab_0
numpy=1.21.2=py39h6635163_0
numpydoc=1.1.0=py_1
oauthlib=3.1.1=pyhd8ed1ab_0
openssl=1.1.1l=h8ffe710_0
opt_einsum=3.3.0=pyhd8ed1ab_1
packaging=21.0=pyhd8ed1ab_0
pandoc=2.14.2=h8ffe710_0
pandocfilters=1.5.0=pyhd8ed1ab_0
paramiko=2.7.2=pyh9f0ad1d_0
parso=0.8.2=pyhd8ed1ab_0
pathspec=0.9.0=pyhd8ed1ab_0
pexpect=4.8.0=pyh9f0ad1d_2
pickleshare=0.7.5=py_1003
pip=21.2.4=pyhd8ed1ab_0
platformdirs=2.3.0=pyhd8ed1ab_0
pluggy=1.0.0=py39hcbf5309_1
poyo=0.5.0=py_0
prompt-toolkit=3.0.20=pyha770c72_0
protobuf=3.14.0=py39h415ef7b_1
psutil=5.8.0=py39hb82d6ee_1
ptyprocess=0.7.0=pyhd3deb0d_0
pyasn1=0.4.8=py_0
pyasn1-modules=0.2.7=py_0
pycodestyle=2.7.0=pyhd8ed1ab_0
pycparser=2.20=pyh9f0ad1d_2
pydocstyle=6.1.1=pyhd8ed1ab_0
pyflakes=2.3.1=pyhd8ed1ab_0
pygments=2.10.0=pyhd8ed1ab_0
pyjwt=2.2.0=pyhd8ed1ab_0
pylint=2.7.2=py39hcbf5309_0
pyls-spyder=0.4.0=pyhd8ed1ab_0
pynacl=1.4.0=py39hb3671d1_2
pyopenssl=21.0.0=pyhd8ed1ab_0
pyparsing=2.4.7=pyh9f0ad1d_0
pyqt=5.12.3=py39hcbf5309_7
pyqt-impl=5.12.3=py39h415ef7b_7
pyqt5-sip=4.19.18=py39h415ef7b_7
pyqtchart=5.12=py39h415ef7b_7
pyqtwebengine=5.12.1=py39h415ef7b_7
pyrsistent=0.17.3=py39hb82d6ee_2
pysocks=1.7.1=py39hcbf5309_3
python=3.9.7=h7840368_3_cpython
python-dateutil=2.8.2=pyhd8ed1ab_0
python-flatbuffers=1.12=pyhd8ed1ab_1
python-lsp-black=1.0.0=pyhd8ed1ab_0
python-lsp-jsonrpc=1.0.0=pyhd8ed1ab_0
python-lsp-server=1.2.3=pyhd8ed1ab_0
python-slugify=5.0.2=pyhd8ed1ab_0
python_abi=3.9=2_cp39
pytz=2021.3=pyhd8ed1ab_0
pyu2f=0.1.5=pyhd8ed1ab_0
pywin32=301=py39hb82d6ee_0
pywin32-ctypes=0.2.0=py39hcbf5309_1003
pyyaml=5.4.1=py39hb82d6ee_1
pyzmq=22.3.0=py39he46f08e_0
qdarkstyle=3.0.2=pyhd8ed1ab_0
qstylizer=0.2.1=pyhd8ed1ab_0
qt=5.12.9=h5909a2a_4
qtawesome=1.0.3=pyhd8ed1ab_0
qtconsole=5.1.1=pyhd8ed1ab_0
qtpy=1.11.2=pyhd8ed1ab_0
regex=2021.10.8=py39hb82d6ee_0
requests=2.25.1=pyhd3deb0d_0
requests-oauthlib=1.3.0=pyh9f0ad1d_0
rope=0.20.1=pyhd8ed1ab_0
rsa=4.7.2=pyh44b312d_0
rtree=0.9.7=py39h09fdee3_2
scipy=1.7.1=py39hc0c34ad_0
setuptools=58.2.0=py39hcbf5309_0
six=1.16.0=pyh6c4a22f_0
snappy=1.1.8=ha925a31_3
snowballstemmer=2.1.0=pyhd8ed1ab_0
sortedcontainers=2.4.0=pyhd8ed1ab_0
sphinx=4.2.0=pyh6c4a22f_0
sphinxcontrib-applehelp=1.0.2=py_0
sphinxcontrib-devhelp=1.0.2=py_0
sphinxcontrib-htmlhelp=2.0.0=pyhd8ed1ab_0
sphinxcontrib-jsmath=1.0.1=py_0
sphinxcontrib-qthelp=1.0.3=py_0
sphinxcontrib-serializinghtml=1.1.5=pyhd8ed1ab_0
spyder=5.1.5=py39hcbf5309_0
spyder-kernels=2.1.3=py39hcbf5309_0
sqlite=3.36.0=h8ffe710_2
tbb=2021.3.0=h2d74725_0
tensorboard=2.5.0=pyhd8ed1ab_1
tensorboard-data-server=0.6.0=py39hcbf5309_0
tensorboard-plugin-wit=1.8.0=pyh44b312d_0
tensorflow=2.5.0=gpu_py39h7dc34a2_0
tensorflow-base=2.5.0=gpu_py39hb3da07e_0
tensorflow-estimator=2.5.0=pyh81a9013_1
tensorflow-gpu=2.5.0=h17022bd_0
termcolor=1.1.0=py_2
testpath=0.5.0=pyhd8ed1ab_0
text-unidecode=1.3=py_0
textdistance=4.2.1=pyhd8ed1ab_0
three-merge=0.1.1=pyh9f0ad1d_0
tinycss2=1.1.0=pyhd8ed1ab_0
tk=8.6.11=h8ffe710_1
toml=0.10.2=pyhd8ed1ab_0
tomli=1.2.1=pyhd8ed1ab_0
tornado=6.1=py39hb82d6ee_1
traitlets=5.1.0=pyhd8ed1ab_0
typed-ast=1.4.3=py39hb82d6ee_0
typing-extensions=3.10.0.2=hd8ed1ab_0
typing_extensions=3.10.0.2=pyha770c72_0
tzdata=2021c=he74cb21_0
ucrt=10.0.20348.0=h57928b3_0
ujson=4.2.0=py39h415ef7b_0
unidecode=1.3.2=pyhd8ed1ab_0
urllib3=1.26.7=pyhd8ed1ab_0
vc=14.2=hb210afc_5
vs2015_runtime=14.29.30037=h902a5da_5
watchdog=2.1.6=py39hcbf5309_0
wcwidth=0.2.5=pyh9f0ad1d_2
webencodings=0.5.1=py_1
werkzeug=2.0.1=pyhd8ed1ab_0
wheel=0.35.1=pyh9f0ad1d_0
win_inet_pton=1.1.0=py39hcbf5309_2
wrapt=1.13.1=py39hb82d6ee_0
yaml=0.2.5=he774522_0
yapf=0.31.0=pyhd8ed1ab_0
yarl=1.7.0=py39hb82d6ee_0
zeromq=4.3.4=h0e60522_1
zipp=3.6.0=pyhd8ed1ab_0
zlib=1.2.11=vc14h1cdd9ab_1

The error I get with the exact same code:

import tensorflow
import os

strategy = tensorflow.distribute.MirroredStrategy(cross_device_ops=tensorflow.distribute.HierarchicalCopyAllReduce())

os._exit(0)

is the following:

An error ocurred while starting the kernel
2021 13:12:55.988419: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021 13:13:08.379082: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2021 13:13:08.379161: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021 13:13:08.386241: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: RT‑Z0M6A
2021 13:13:08.386913: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: RT‑Z0M6A
2021 13:13:08.387665: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance‑critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
ccordoba12 commented 2 years ago

@RomanFoell, thanks for the extra info.

@dalthviz, could you try to reproduce this problem in your side?

dalthviz commented 2 years ago

@ccordoba12 @RomanFoell I tried creating the test env but when running conda install spyder I got the Found conflicts message. Not totally sure if it is because I used miniconda to create the env. I will try with the anaconda distribution and let you guys know

RomanFoell commented 2 years ago

Hi dalthviz,

I mainly used conda forge as channel for installation, perhaps this helps, tomorrow I will update this post and tell you which channels and in which order I used them for installation:

.condarc:

default_channels:
  - anaconda
  - r

channels:
  - conda-forge
  - defaults

allow_other_channels: yes
binstar_upload: no
show_channel_urls: yes
notify_outdated_conda: no
report_errors: yes
ssl_verify: true
dalthviz commented 2 years ago

Just in case, checking with the anaconda distribution (Anaconda3-2021.05-Windows-x86_64) I'm seeing that again the Found conflicts message appears

dalthviz commented 2 years ago

@ccordoba12 @RomanFoell I tried with the .condarc file definition provided but now I'm getting a Found conflict message when trying to install tensorflow-gpu:


Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - tensorflow-gpu -> python[version='3.5.*|3.6.*|3.7.*|>=3.7,<3.8.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0|>=2.7,<2.8.0a0']

Your python: python==3.9.7

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

Following the constrain shown, I created a new env with Python 3.7 and then I was able to install tensorflow-gpu and spyder.

In the env created I was able to run the example code:

imagen

Although, I noticed that a message appears in the cmd: link image0 hasn't been detected! and the splash screen has dark letters:

imagen

Other than that, seems like everything is working but only if I create an env with Python 3.7

RomanFoell commented 2 years ago

Ok, thanks. Perhaps I have to install the newest Anaconda version. At the moment this seems to be different to your setup. I will try and inform you.

dalthviz commented 2 years ago

Thanks @RomanFoell ! Although, now that it has finished (and probably reached the os._exit(0)) this is what I'm seeing (no error related to tensorflow but yet seems like we are taking the stdout as stderr @ccordoba12 ?):

An error ocurred while starting the kernel

2021󈚮󈚰 11:40:25.027532: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021󈚮󈚰 11:40:27.854899: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2021󈚮󈚰 11:40:28.283517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce MX130 computeCapability: 5.0
coreClock: 1.189GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2021󈚮󈚰 11:40:28.283563: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021󈚮󈚰 11:40:28.338979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2021󈚮󈚰 11:40:28.384975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2021󈚮󈚰 11:40:28.392459: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2021󈚮󈚰 11:40:28.446080: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2021󈚮󈚰 11:40:28.469360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2021󈚮󈚰 11:40:28.587318: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2021󈚮󈚰 11:40:28.587603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021󈚮󈚰 11:40:28.589481: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2021󈚮󈚰 11:40:28.593724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce MX130 computeCapability: 5.0
coreClock: 1.189GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2021󈚮󈚰 11:40:28.593754: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021󈚮󈚰 11:40:28.593767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2021󈚮󈚰 11:40:28.593777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2021󈚮󈚰 11:40:28.593787: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2021󈚮󈚰 11:40:28.593796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2021󈚮󈚰 11:40:28.593805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2021󈚮󈚰 11:40:28.593814: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2021󈚮󈚰 11:40:28.593842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021󈚮󈚰 11:47:05.006370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021󈚮󈚰 11:47:05.006404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 
2021󈚮󈚰 11:47:05.006412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N 
2021󈚮󈚰 11:47:05.007837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1376 MB memory) ‑> physical GPU (device: 0, name: NVIDIA GeForce MX130, pci bus id: 0000:01:00.0, compute capability: 5.0)
ccordoba12 commented 2 years ago

no error related to tensorflow but yet seems like is we are taking the stdout as stderr @ccordoba12 ?)

I don't know if those messages are printed in stdout or stderr.

But @dalthviz, can you keep using the console after that message?

dalthviz commented 2 years ago

No @ccordoba12 , my guess is that the tensorflow log is getting write in the consoles's error log file and then when the kernel in restarting/starting it shows you the error log file instead of starting normally (it shows the banner with red background)

ccordoba12 commented 2 years ago

it shows the banner with red background

Ok, could you post an image of that to better understand what happens?

dalthviz commented 2 years ago

imagen imagen imagen

ccordoba12 commented 2 years ago

Ok, and what happens if you run in the internal console these commands?

mw = spy.window
ic = mw.ipyconsole
cl = ic.get_current_client()
cl.infowidget.hide()

Is the console usable after that?

dalthviz commented 2 years ago

No, after that the console is empty and if I run cl.shellwidget.show() the infowidget gets shown again:

tfspy

ccordoba12 commented 2 years ago

No, after that the console is empty

Well, then that means that's really an error and the kernel can't be started due to it.

One last question for you @dalthviz: what happens if you run the same code in a Jupyter notebook?

dalthviz commented 2 years ago

It shows a message that the kernel died and needs to be restarted, after that you can still interact with the notebook:

imagen

imagen

ccordoba12 commented 2 years ago

Ok, I better understand the problem now. This only shows up in Spyder because we save to a file the stderr stream generated by the kernel and display it to the user to let them know about possible errors when the kernel starts.

In this case, tensorflow prints a lot of info to stderr (which should be sent to stdout, I'd say) and that's why we show that info in the console and don't restart the kernel (as Jupyter does).

The solution is to skip our error reporting mechanism if we detect some tensorflow string patterns in the kernel stderr file. I'll leave this for 5.2.1 for now to wait until the migration to the IPython console to the new API is finished.