rolczynski / Automatic-Speech-Recognition

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
GNU Affero General Public License v3.0
223 stars 64 forks source link

Negative loss value #22

Open st-tomic opened 4 years ago

st-tomic commented 4 years ago

Hi,

I am trying to run the sample training on a librispeech clean 100h.

After few hours of training with batch size=10 the printed loss value becomes negative. It happens in the first epoch.

The thing i changed is read_audiofunction to use soundfile for reading flac files insted of waves with wavfile.read. Although both give the same output when reading files so it shouldn't make a difference.

Are you familiar with the issue? Loss seems to decrease too fast. Any guess what is going wrong?

rolczynski commented 4 years ago

Hey! Super weird. Could you provide more details?

st-tomic commented 4 years ago

I agree :)

The only changes are stated above. Used pipeline in basic.py and readme page. I have also tried it with tfv2.0-GPU and the loss is also decreasing extremely fast within 1st epoch which doesn't seem real.

I only modified audio loading part to use soundfile and used future-fstrings to support f strings on Python 3.5. Other than that, nothing is changed from your repo.

The loss is frozen at -0.6921 in the last try.

Data loaded from librispeech:

dataset = asr.dataset.Audio.from_csv('examples/libri-100.csv', batch_size=10)
dev_dataset = asr.dataset.Audio.from_csv('examples/dev-clean.csv', batch_size=10)
test_dataset = asr.dataset.Audio.from_csv('examples/test-clean.csv')
HunbeomBak commented 4 years ago

i have a same problem.

i tried training my dataset.

` You do not need to update to CUDA 9.2.88; cherry-picking the ptxas binary is sufficient. 475/476 [============================>.] - ETA: 1s - loss: 2.6582

476/476 [==============================] - 802s 2s/step - loss: 2.6511 - val_loss: -0.6931 Epoch 2/5 119/476 [======>.......................] - ETA: 5:47 - loss: -0.6931 `

After first epoch, val_loss was negative.

and second epoch also had negative loss. the loss does not change, and remains at -0.6931 in second epoch.

i use your environment-gpu.yml for creating a conda environments.

my english skill is bad, but i did my best.

ValeriyMarkin commented 4 years ago

Hi, I had the same problem. For me, it turned out that the pipeline.fit() method returns an empty string instead of a correct transcript, so a model learns to predict it. I used a following code and it works:

dataset =pipeline.wrap_preprocess(dataset, False, None) y = tf.keras.layers.Input(name='y', shape=[None], dtype='int32') loss = pipeline.get_loss() pipeline._model.compile(pipeline._optimizer, loss, target_tensors=[y]) pipeline._model.fit(dataset,epochs=20)

askinucuncu commented 3 years ago

I guess there is no improvement in this regard. Because the system still produces negative values.

wenjingyang commented 3 years ago

I had similar issue even if I just tried example(basic.py) on tf v2.1. I think the negative loss value might be acceptable. https://github.com/keras-team/keras/issues/9369

But when I used the predict to predict the test.csv (same as training file), the output is empty.['']. It doesn't look reasonable now. code in basic.py : pipeline.predict(data)

Epoch 1/5 1/1 [==============================] - 3s 3s/step - loss: 303.5161 1/1 [==============================] - 19s 19s/step - loss: 610.6132 - val_loss: 303.5161 Epoch 2/5 Epoch 1/5 1/1 [==============================] - 1s 1s/step - loss: 61.1079 1/1 [==============================] - 7s 7s/step - loss: 76.0996 - val_loss: 61.1079 Epoch 3/5 Epoch 1/5 1/1 [==============================] - 1s 1s/step - loss: 9.9619 1/1 [==============================] - 7s 7s/step - loss: 4.4410 - val_loss: 9.9619 Epoch 4/5 Epoch 1/5 1/1 [==============================] - 1s 1s/step - loss: 2.4088 1/1 [==============================] - 7s 7s/step - loss: 0.6229 - val_loss: 2.4088 Epoch 5/5 Epoch 1/5 1/1 [==============================] - 1s 1s/step - loss: 0.5115 1/1 [==============================] - 7s 7s/step - loss: -0.3944 - val_loss: 0.5115

wenjingyang commented 3 years ago

image

Hmm. I think the code should be below in automatic_speech_recognition/pipeline/ctc_pipeline.py dev_dataset = self.wrap_preprocess(dev_dataset, prepared_features, augmentation)

Right? Let me know if I am wrong. Thanks.