Open Jiahu235 opened 4 months ago
Here is the packages in my environment:
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.1.0 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
ca-certificates 2024.3.11 h06a4308_0
cachetools 4.2.4 pypi_0 pypi
certifi 2022.12.7 py37h06a4308_0
charset-normalizer 3.3.2 pypi_0 pypi
configargparse 1.7 pypi_0 pypi
cudatoolkit 10.1.243 h6bb024c_0
cudnn 7.6.5 cuda10.1_0
gast 0.3.3 pypi_0 pypi
google-auth 1.35.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.62.2 pypi_0 pypi
h5py 2.10.0 pypi_0 pypi
idna 3.7 pypi_0 pypi
importlib-metadata 6.7.0 pypi_0 pypi
joblib 1.3.2 pypi_0 pypi
keras 2.3.1 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
markdown 3.4.4 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
mashumaro 3.9.1 pypi_0 pypi
ncurses 6.4 h6a678d5_0
numpy 1.16.4 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
openssl 1.1.1w h7f8727e_0
opt-einsum 3.3.0 pypi_0 pypi
orderedset 2.0.3 pypi_0 pypi
packaging 24.0 pypi_0 pypi
pandas 0.24.2 pypi_0 pypi
pip 22.3.1 py37h06a4308_0
protobuf 3.20.0 pypi_0 pypi
psutil 6.0.0 pypi_0 pypi
pyasn1 0.5.1 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
python 3.7.16 h7a1cb2a_0
python-dateutil 2.9.0.post0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.31.0 pypi_0 pypi
requests-oauthlib 2.0.0 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scikit-learn 1.0.2 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
setuptools 65.6.3 py37h06a4308_0
six 1.16.0 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0
tensorboard 2.2.2 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tensorboardx 2.6.2.2 pypi_0 pypi
tensorflow-estimator 2.2.0 pypi_0 pypi
tensorflow-gpu 2.2.0 pypi_0 pypi
termcolor 2.3.0 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
tk 8.6.14 h39e8969_0
typing-extensions 4.7.1 pypi_0 pypi
urllib3 2.0.7 pypi_0 pypi
werkzeug 2.2.3 pypi_0 pypi
wheel 0.38.4 py37h06a4308_0
wrapt 1.16.0 pypi_0 pypi
xz 5.4.6 h5eee18b_1
zipp 3.15.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_1
Hi, thanks for the information. This error indicates the model weights are too large. Does this error appear immediately or only after some rounds?
One straightforward way to mitigate this issue might be to reduce the learning rate.
Hello! I'm encountering an error when running the code, consistently across both the MNIST and CIFAR-10 datasets. Regardless of the configures I use (including the config files in
train_configs
directory), it reports something wrong stating "Layer xx is NaN!" for each layer. Additionally, I receive a warning that says "WARNING:tensorboardX.x2num: NaN or Inf found in input tensor."Here is my
mnist_setup.yml
file for MNIST dataset:And this is my
mnist_setup.yml
file for CIFAR-10 dataset:I suspect that the issue might stem from an incorrect version of a package in my environment configuration, but what confuses me is that the code runs correctly with the Shakespeare dataset.