lindawangg / COVID-Net

COVID-Net Open Source Initiative
Other
1.15k stars 477 forks source link

I ran COVIDNet-CXR-2 on Kaggle and loss explodes after few epochs #211

Open homerdiaz opened 2 years ago

homerdiaz commented 2 years ago

Quick question before give the details: Is the COVIDNet-CXR-2 model trained already? Im asking you guys because I trained the model on Kaggle for 3 epochs and the loss seems to explode. Thanks! Details below.

I implemented COVIDNet-CXR-2 model on Kaggle using the benchmark dataset on kaggle: Benchmark dataset: https://www.kaggle.com/andyczhao/covidx-cxr2

Before the first epoch I got the same results you guys reported. I got: Output: ./COVIDNet-lr0.0002 13992 16490 Sens Negative: 0.970, Positive: 0.955 PPV Negative: 0.956, Positive: 0.970

After 3 epochs the loss explodes as you can see below: Training started 1749/1749 [==============================] - 1538s 877ms/step Epoch: 0001 Minibatch loss= 6443.208007812 [[ 1. 199.] [ 1. 199.]] Sens Negative: 0.005, Positive: 0.995 PPV Negative: 0.500, Positive: 0.500 Saving checkpoint at epoch 1 1749/1749 [==============================] - 3029s 837ms/step Epoch: 0002 Minibatch loss= 39658.683593750 [[ 0. 200.] [ 0. 200.]] Sens Negative: 0.000, Positive: 1.000 PPV Negative: 0.000, Positive: 0.500 Saving checkpoint at epoch 2 1749/1749 [==============================] - 4489s 819ms/step Epoch: 0003 Minibatch loss= 122188.039062500 [[ 0. 200.] [ 2. 198.]] Sens Negative: 0.000, Positive: 0.990 PPV Negative: 0.000, Positive: 0.497 Saving checkpoint at epoch 3 Optimization Finished!

haydengunraj commented 2 years ago

Hi @homerdiaz,

These models are in fact trained already. With respect to the loss instability, one thing you can try is reducing the learning rate. The default learning rate in our scripts may be too high, as these models are highly-optimized versions of much larger baseline models and were not trained from scratch in their current forms.

homerdiaz commented 2 years ago

Thanks @haydengunraj !!