Debug loss: Nan - Githubissues

lpc-eol commented 1 year ago

Could you please guide me to debug what is causing the loss Nan? I only modify the MEMORY_LIMIT_IN_MB = 13 * 1024 to leave some memory space for running the validation set and the -lr and dwd for debugging purposes. Before running the commands below, I have followed the Feature pre-processing and run the commands for generating baidu2.0 features.

I am trying to reproducing results from your experiments: Challenge Validated protocol in Experiments using Combination×2 features in SoccerNet-action-spotting-challenge-2022.md and facing loss = Nan after 1-4 epochs training after running the command :

Command: Train the confidence model on the soccernet_v2_challenge_validation dataset (spotting_challenge_validated protocol) on the baidu_2.0 features
 ./bin/train.py -sd ./data/splits/ -ld ./data/labels/ -fd ./data/features/baidu_2.0/ -fn baidu_2.0 -dt soccernet_v2_challenge_validation -cd ./configs/soccernet_challenge_confidence -dc dense -cw 1.0 -lr 5e-5 -dwd 5e-5 -sr 0.0 -mu 0.0 -m YOUR_MODELS_DIR/spotting_challenge_validated_baidu_2.0_confidence_first_lr5e-5_dwd5e-5_sr0.0_mu0.0

By running the command on the above, I got loss=Nan in the second epochs, first epoch loss=0.685, and I found after setting the lr to 1e-5, the loss=Nan occurs after 4-5 epochs.(after multiple attemps)

Sample output:

<frozen importlib._bootstrap>:219: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject               
INFO:root:Preparing datasets                                                                                                                                                                               
2023-08-16 22:54:50.469040: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero                                                                                                                                                                                           
2023-08-16 22:54:50.469355: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero                                                                                                                                                                                           
2023-08-16 22:54:50.479007: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero                                                                                                                                                                                           
2023-08-16 22:54:50.479293: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero                                                                                                                                                                                           
2023-08-16 22:54:50.479511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero                                                                                                                                                                                           
2023-08-16 22:54:50.479664: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1953] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 8.6. CUDA kernels will be jit-co
mpiled from PTX, which could take 30 minutes or longer.                                                                                                                                                    
2023-08-16 22:54:50.479729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero                                                                                                                                                                                           
2023-08-16 22:54:50.479885: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1953] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 8.6. CUDA kernels will be jit-co
mpiled from PTX, which could take 30 minutes or longer.                                                                                                                                                    
INFO:root:TrainingState {epoch: 0, iterations: 0, best_metric (none): 0.0, best_epoch: 0, broken_iterations_epochs: 0}                                                                                     
INFO:root:Confidence positive_weight_per_class                                                                                                                                                             
INFO:root:[ 74.17185851   6.05214093   7.87481848   4.67710239   6.0502006                                                                                                                                 
   2.23716434   2.39034598   1.59444876   0.39803328   0.66940347                                                                                                                                          
   1.07125834   1.1806773    5.78118512   2.61791677   6.22174454                                                                                                                                          
 241.94153846 267.40906883]                                                                                                                                                                                
INFO:root:Confidence negative_weight_per_class                                                                                                                                                             
INFO:root:[0.97039249 0.97483217 0.97370945 0.97626197 0.97483373 0.98318434                                                                                                                               
 0.9823287  0.9886008  1.04906894 1.01551117 0.99794695 0.99528945                                                                                                                                         
 0.97505983 0.98124457 0.97469981 0.97012029 0.97010883]                                                                                                                                                   
2023-08-16 22:54:51.738092: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions
 in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-08-16 22:54:51.933841: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:54:51.934103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:54:51.934317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:54:51.934517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:54:51.934716: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:54:51.934861: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1953] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 8.6. CUDA kernels will be jit-co
mpiled from PTX, which could take 30 minutes or longer.
2023-08-16 22:54:51.934929: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:54:51.935072: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1953] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 8.6. CUDA kernels will be jit-co
mpiled from PTX, which could take 30 minutes or longer.
2023-08-16 22:57:25.366788: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:57:25.367079: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:57:25.367303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
2023-08-16 22:57:25.367516: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:57:25.367703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13312 MB memory:  -> device: 0, name: NVIDIA GeForce 
RTX 3090, pci bus id: 0000:81:00.0, compute capability: 8.6
2023-08-16 22:57:25.368133: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:57:25.368368: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returnin
g NUMA node zero
2023-08-16 22:57:25.368530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13312 MB memory:  -> device: 1, name: NVIDIA GeForce 
RTX 3090, pci bus id: 0000:c1:00.0, compute capability: 8.6
INFO:root:Preparing training TF dataset from 800 videos
INFO:root:chunk_shuffle_size: 734
WARNING:tensorflow:From /home/cihe-max/CQL-CIHE/git_repo/spivak/spivak/models/tf_dataset.py:41: choose_from_datasets_v2 (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and wil
l be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.choose_from_datasets(...)` instead. Note that, unlike the experimental endpoint, the non-experimental endpoint sets `stop_on_empty_dataset=True` by default. You should set this argum
ent explicitly in case you would like to match the behavior of the experimental endpoint.
WARNING:tensorflow:From /home/cihe-max/CQL-CIHE/git_repo/spivak/spivak/models/tf_dataset.py:41: choose_from_datasets_v2 (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and wil
l be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.choose_from_datasets(...)` instead. Note that, unlike the experimental endpoint, the non-experimental endpoint sets `stop_on_empty_dataset=True` by default. You should set this argum
ent explicitly in case you would like to match the behavior of the experimental endpoint.
INFO:root:Preparing validation TF dataset from 200 videos

Model summary(skip)

2023-08-16 23:19:17.385598: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 674 of 734
2023-08-16 23:19:27.891439: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 680 of 734
2023-08-16 23:19:38.646033: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 686 of 734
2023-08-16 23:19:47.793710: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 690 of 734
2023-08-16 23:19:59.976747: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 695 of 734
2023-08-16 23:20:08.956928: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 700 of 734
2023-08-16 23:20:17.788706: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 705 of 734
2023-08-16 23:20:28.449190: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 710 of 734
2023-08-16 23:20:38.805260: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 715 of 734
2023-08-16 23:20:47.860918: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 720 of 734
2023-08-16 23:20:58.939910: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 725 of 734
2023-08-16 23:21:07.706438: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 729 of 734
2023-08-16 23:21:16.438975: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:405] Shuffle buffer filled.
2023-08-16 23:27:08.023521: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 7605
32/32 [==============================] - 1805s 741ms/step - loss: 0.6924
Epoch 2/1000
32/32 [==============================] - 24s 752ms/step - loss: 0.7004
Epoch 3/1000
32/32 [==============================] - 24s 760ms/step - loss: nan                        
Epoch 4/1000
32/32 [==============================] - 24s 762ms/step - loss: nan

I am more familiar with Pytorch and trying my best to understand the code. Thank you for your assistance!

jvbsoares commented 1 year ago

Hi Leo, Thanks for trying this out. Perhaps this issue could be related to your TF installation? What TF and CUDA versions are you using? I'm able to successfully train with the above command, which is why I'm thinking that it might be worth checking the details of your setup. Does your loss usually go up before you get NaN? Here's what the initial losses look like for me:

Epoch 1/1000 32/32 [==============================] - 31s 967ms/step - loss: 0.5310 Epoch 2/1000 32/32 [==============================] - 25s 776ms/step - loss: 0.2052 Epoch 3/1000 32/32 [==============================] - 25s 769ms/step - loss: 0.1407 Epoch 4/1000 32/32 [==============================] - 24s 757ms/step - loss: 0.1249 Epoch 5/1000 32/32 [==============================] - 24s 764ms/step - loss: 0.1138 Epoch 6/1000 32/32 [==============================] - 26s 797ms/step - loss: 0.1034 Epoch 7/1000 32/32 [==============================] - 24s 757ms/step - loss: 0.0967 Epoch 8/1000 32/32 [==============================] - 24s 765ms/step - loss: 0.0885 Epoch 9/1000 32/32 [==============================] - 24s 752ms/step - loss: 0.0814 Epoch 10/1000 32/32 [==============================] - 178s 6s/step - loss: 0.0776 - val_loss: 0.0773

I'm also not totally sure what's the best way to debug this. As a sanity check, you could verify your TF installation by training some simple model from one of the TF tutorials. You could also try to make the learning rate very low (like 1e-6 or 1e-7), just to see what happens.

lpc-eol commented 1 year ago

Thank you so much for your reply! I am using TF 2.7.0 and here is the details:

cuda-version              10.2                 h4767cc1_2    conda-forge
cudatoolkit               10.2.89             hdec6ad0_12    conda-forge
cudnn                     7.6.5.32             h01f27c4_1    conda-forge

tensorflow                2.7.0           cuda102py38h32e99bf_0    conda-forge
tensorflow-addons         0.13.0                   pypi_0    pypi
tensorflow-base           2.7.0           cuda102py38h021f141_0    conda-forge
tensorflow-estimator      2.7.0           cuda102py38h4357c17_0    conda-forge
tensorflow-gpu            2.7.0           cuda102py38hf05f184_0    conda-forge
tensorflow-probability    0.11.1                   pypi_0    pypi

I tried to set the learning rate to 1e-6. i.e. -lr 1e-6 and -dwd 1e-6 here is the result:

2023-08-17 00:55:33.072818: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 725 of 734                                                        
2023-08-17 00:55:38.301424: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 728 of 734                                                        
2023-08-17 00:55:48.925498: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:405] Shuffle buffer filled.                                                                                               
2023-08-17 01:02:28.225712: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 7605                                                                                                   
32/32 [==============================] - 1884s 723ms/step - loss: 0.6924                                                                                                                                   
Epoch 2/1000                                                                                                                                                                                               
32/32 [==============================] - 23s 738ms/step - loss: 0.6928                                                                                                                                     
Epoch 3/1000                                                                                                                                                                                               
32/32 [==============================] - 24s 751ms/step - loss: 0.6920                                                                                                                                     
Epoch 4/1000                                                                                                                                                                                               
32/32 [==============================] - 24s 742ms/step - loss: 0.6917                                                                                                                                     
Epoch 5/1000                                                                                                                                                                                               
32/32 [==============================] - 24s 740ms/step - loss: 0.6919                                                                                                                                     
Epoch 6/1000                                                                                                                                                                                               
32/32 [==============================] - 24s 744ms/step - loss: 0.6919                                                                                                                                     
Epoch 7/1000                                                                                                                                                                                               
32/32 [==============================] - 24s 740ms/step - loss: 0.6919                                                                                                                                     
Epoch 8/1000                                                                                                                                                                                               
32/32 [==============================] - 24s 744ms/step - loss: 0.6912                                                                                                                                     
Epoch 9/1000                                                                                                                                                                                               
32/32 [==============================] - 24s 749ms/step - loss: 0.6916                                                                                                                                     
Epoch 10/1000                                                                                                                                                                                              
32/32 [==============================] - 369s 12s/step - loss: 0.6903 - val_loss: 0.6903                                                                                                                   
Epoch 11/1000                                                                                                                                                                                              
32/32 [==============================] - ETA: 0s - loss: 0.69072023-08-17 01:12:38.409781: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the fu
ture, so consider avoiding using them.                                                                                                                                                                     
32/32 [==============================] - 35s 1s/step - loss: 0.6907                                                                                                                                        
Epoch 12/1000
 1/32 [..............................] - ETA: 15s - loss: 0.6922<frozen importlib._bootstrap>:219: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility
. Expected 56 from C header, got 64 from PyObject
32/32 [==============================] - 24s 762ms/step - loss: 0.6897
Epoch 13/1000
32/32 [==============================] - 24s 760ms/step - loss: 0.6901
Epoch 14/1000
32/32 [==============================] - 24s 764ms/step - loss: 0.6899
Epoch 15/1000
32/32 [==============================] - 24s 762ms/step - loss: 0.6893
Epoch 16/1000
32/32 [==============================] - 24s 761ms/step - loss: 0.6881
Epoch 17/1000
32/32 [==============================] - 24s 760ms/step - loss: 0.6887
Epoch 18/1000
32/32 [==============================] - 24s 763ms/step - loss: 0.6882
Epoch 19/1000
32/32 [==============================] - 24s 766ms/step - loss: 0.6878
Epoch 20/1000
32/32 [==============================] - 50s 2s/step - loss: 0.6881 - val_loss: 0.6876                                                                                                                     
Epoch 21/1000
32/32 [==============================] - ETA: 0s - loss: 0.6876INFO:root: *** Got new best validation metric (average_map_tight) of 0.002679356077683897
INFO:root: *** Done saving models and evaluation files.
32/32 [==============================] - 551s 18s/step - loss: 0.6816                                                                                                                                      
Epoch 32/1000                                                                                                                                                                                              
32/32 [==============================] - 25s 790ms/step - loss: 0.6818                                                                                                                                     
Epoch 33/1000                                                                                                                                                                                              
32/32 [==============================] - 25s 779ms/step - loss: 0.6813
Epoch 34/1000
32/32 [==============================] - 25s 773ms/step - loss: 0.6801
Epoch 35/1000
32/32 [==============================] - 25s 784ms/step - loss: 0.6794
Epoch 36/1000
32/32 [==============================] - 25s 778ms/step - loss: 0.6790
Epoch 37/1000
32/32 [==============================] - 25s 785ms/step - loss: 0.6786
Epoch 38/1000
32/32 [==============================] - 24s 770ms/step - loss: 0.6763
Epoch 39/1000
32/32 [==============================] - 24s 768ms/step - loss: 0.6755
Epoch 40/1000
32/32 [==============================] - 51s 2s/step - loss: 0.6735 - val_loss: 0.6683                                                                                                                     
Epoch 41/1000
32/32 [==============================] - ETA: 0s - loss: nanINFO:root:Validation metric (average_map_tight): 0.002562228553085689
32/32 [==============================] - 811s 26s/step - loss: nan
Epoch 42/1000
32/32 [==============================] - 25s 799ms/step - loss: nan
Epoch 43/1000
32/32 [==============================] - 25s 794ms/step - loss: nan
Epoch 44/1000
32/32 [==============================] - 25s 787ms/step - loss: nan
Epoch 45/1000
32/32 [==============================] - 25s 789ms/step - loss: nan
Epoch 46/1000
32/32 [==============================] - 25s 784ms/step - loss: nan
Epoch 47/1000
32/32 [==============================] - 25s 792ms/step - loss: nan
Epoch 48/1000
32/32 [==============================] - 25s 776ms/step - loss: nan
Epoch 49/1000
32/32 [==============================] - 25s 775ms/step - loss: nan
Epoch 50/1000
32/32 [==============================] - 52s 2s/step - loss: nan - val_loss: nan
Epoch 51/1000
32/32 [==============================] - ETA: 0s - loss: nanINFO:root:Validation metric (average_map_tight): 0.0
32/32 [==============================] - 781s 25s/step - loss: nan
Epoch 52/1000
32/32 [==============================] - 26s 814ms/step - loss: nan
Epoch 53/1000
32/32 [==============================] - 25s 783ms/step - loss: nan

I appreciate you taking the time to provide guidance on the environment settings. I will be sure to pay closer attention to properly configuring the environment going forward. Would you recommend using TensorFlow 2.3.0 instead of 2.7.0 for this project?

jvbsoares commented 1 year ago

Thank you for sharing the details! I just tested this example now with TF 2.7.0 and it works for me. However, my setup was on CUDA 11.2. The table in the following link suggests a newer version of CUDA than what you have might be needed for TF 2.7.0?: https://www.tensorflow.org/install/source#gpu We've tested our code also on 2.3.0, so you could also try it out, since the table indicates it would be compatible with CUDA 10.1 (and hopefully 10.2, which is what you have now?). I guess you might want to run some sanity checks to test your installation before messing with that, though.

lpc-eol commented 1 year ago

Thank you so much for your guidance. I updated the CUDA to 11.2, and the problem is fixed. The loss value looks good!

cuda-version              11.2                 hb11dac2_2    conda-forge
cudatoolkit               11.2.2              hc23eb0c_12    conda-forge
cudnn                     8.8.0.121            h0800d71_1    conda-forge

2023-08-18 11:14:20.794201: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:380] Filling up shuffle buffer (this may take a while): 729 of 734                  
2023-08-18 11:14:32.289327: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:405] Shuffle buffer filled.                                                         
2023-08-18 11:14:38.187317: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8800                                                             
2023-08-18 11:14:39.116424: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged 
once.                                                                                                                                                                
32/32 [==============================] - 1554s 748ms/step - loss: 0.5153                                                                                             
Epoch 2/1000                                                                                                                                                         
32/32 [==============================] - 24s 741ms/step - loss: 0.1879                                                                                               
Epoch 3/1000                                                                                                                                                         
32/32 [==============================] - 24s 748ms/step - loss: 0.1343                                                                                               
Epoch 4/1000                                                                                                                                                         
32/32 [==============================] - 24s 741ms/step - loss: 0.1221                                                                                               
Epoch 5/1000                                                                                                                                                         
32/32 [==============================] - 24s 743ms/step - loss: 0.1153                                                                                               
Epoch 6/1000                                                                                                                                                         
32/32 [==============================] - 24s 740ms/step - loss: 0.1042                                                                                               
Epoch 7/1000                                                                                                                                                         
32/32 [==============================] - 24s 747ms/step - loss: 0.0982                                                                                               
Epoch 8/1000                                                                                                                                                         
32/32 [==============================] - 24s 750ms/step - loss: 0.0920                                                                                               
Epoch 9/1000                                                                                                                                                         
32/32 [==============================] - 24s 742ms/step - loss: 0.0835                                                                                               
Epoch 10/1000                                                                                                                                                        
32/32 [==============================] - ETA: 0s - loss: 0.0792                                                                                                      
32/32 [==============================] - 512s 16s/step - loss: 0.0792 - val_loss: 0.0761

jvbsoares commented 1 year ago

Thank you for the update. Glad it worked out!

yahoo / spivak

Debug loss: Nan #9