More training = Higher Loss

TylerGubala commented 5 years ago

I've been loving this utility and have been amused with the results so far.

Something strange that I've been noticing though; I've given it various texts and it seems like it's been choking after several training epochs.

Example code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
Generate text from movie quotes corpus
'''

from textgenrnn import textgenrnn

# Create generator rnn
GENERATOR = textgenrnn()

GENERATOR.train_from_file("C:/Users/TGubs/Desktop/quoteraw.txt", num_epochs=5)

GENERATOR.save("dialog_generator.hdf5")

Output:

Using TensorFlow backend. WARNING:tensorflow:From C:\Users\TGubs\Code\Python\role-play\venv\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. 2019-04-10 17:55:26.098329: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-04-10 17:55:26.409520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 totalMemory: 11.00GiB freeMemory: 9.11GiB 2019-04-10 17:55:26.420960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-04-10 17:55:27.947888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-10 17:55:27.955368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-04-10 17:55:27.960687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-04-10 17:55:27.966653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8791 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 304,712 texts collected. Training on 16,895,539 character sequences. WARNING:tensorflow:From C:\Users\TGubs\Code\Python\role-play\venv\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From C:\Users\TGubs\Code\Python\role-play\venv\lib\site-packages\tensorflow\python\ops\math_grad.py:102: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. Epoch 1/5 2019-04-10 18:19:19.758068: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally 131996/131996 [==============================] - 2013s 15ms/step - loss: 1.4664 #################### Temperature: 0.2 #################### What are you doing?

I want to be a man. I don't know what I don't know what I want to be the point of the trial that was the country of the more than the time. I want to have to go about the world and the bare that was a good thing and they want to talk to you that was the more of the ball that was a couple of that wa

I want to get that the couple of the world that was the world and the way they were a little and that was a minute of the back.

#################### Temperature: 0.5 #################### You shall that's what you don't want to talk to you that?

I want to be the way I want to do that.

Thank you.

#################### Temperature: 1.0 #################### He wasn't. Let's really men.

She all listen maybe looking to the world for Dr....

Transcusture.

Epoch 2/5 131996/131996 [==============================] - 2038s 15ms/step - loss: 1.4116 #################### Temperature: 0.2 #################### You want to be a big moment when I was the world to the way the world was a brother and the world was the world of the money and what was that to the way to the way to the other way to the whole part of the story is the world was a fact and the way I was a problem and the world was the world probab

I don't know. I want to be a bitch.

I don't know. I was willing to the way.

#################### Temperature: 0.5 #################### I don't want to say. I think the way that shit with it. I'm not right.

Yes.

No not a business to be a business.

#################### Temperature: 1.0 #################### Now, I dance Scaten Birls Count.

It get the art or to sister a taxe or ride yet have that's ready to have up Grands. It issui watch. Becausion oftensa of time. And what else?

That's who I'm a pretty split. That Aexty enough my ageket is that's right.

Epoch 3/5 131996/131996 [==============================] - 1973s 15ms/step - loss: 1.7898 #################### Temperature: 0.2 #################### Yeseoutee tote the but of thee to thee the woll to the erend to the theak the Rete the theak was the b do the pery beal wait s of the the mank and the say see and your theae and the time thee to to thee en and the peront anС the the to thee the to the parts wellt an wat it's to theee there of it

You out that thee some and to and the ta is the theat havee the and to he onte the thee t a be and a the thear to to the but of theate the to the the thee re thee e a the the thee but the eraent to but het the t e an don' to just“ and in the a to the thee the thee to theeaor I to the it Jeat here

Yseody't the eee do t the and out to an doa thee to the thee mine t to thee t offt it the thereer the an to onnnnen the say the the I the thee the are and the a don't te lail the atinge some treae theeing ther the to the Every bealing the mean theo the but ot the woll do

#################### Temperature: 0.5 #################### Now about onf at houm tha jou's it houd ereotha thee do

We and launt list in to ine thre rbare done t hainte and the a splapere no to the blow waes.

Yhaeary? a- this way end what doadingg to wain to gon't lealling g ourt ahig.l getat ios to tog the ad of tellong ther was wtt hhate to justa aboutt o the bout say. u nw buttHet time the theer terse theing to thee an in the off ir to an to bud wth abood the abou about to pould it mshues evere ev

#################### Temperature: 1.0 #################### I'm c w t t hen. ?rlyn ontahan atiss pick th me seat id packer thureref .. Ijclt asetper!

Not beur lst enres abouint. I'mo. Iilahitne tlogl alialists you ton tmoond hene efe osnf cellr atna it acic theined! But it Jroabigh.

Willa hvde rever herome hou boltroud."""o'p t vouen te ent him for int hese. Istn.

Epoch 4/5 131996/131996 [==============================] - 1982s 15ms/step - loss: 3.2629 #################### Temperature: 0.2 #################### e ne nn n e i --------------------------------------- ---------------------- - -------------- -- BBBCBBBBBCj BeBBBBBj "jjj" ee Iee Ye ae e O I the G a n t e I e e s
e                             B      e n     h  t     h    e  e              e   B         m                e   e    e         e   o   n e           e      e  e e
n n e I IO BC eo e t e e eejjeBBjjB n n n
                                       u         n  e   e         i        "-        je.jjnjjt s   ee     e     e e     e    I t r  A                         e   n  e   I     r    eh  ee
e e a e eIe e e eI ie o j "I t tn I

#################### Temperature: 0.5 #################### aee shwee t oh la a st sey

aw t a e staate t. t te -ee -rhne han nkro nss e -a eeW s nst eeuht e ttonf pon ofu tthetnn ns s ehte hnee sa uinn d !u

tn n i h iselnwle y .ehce. AIH o bit ohoaleeb e.

#################### Temperature: 1.0 #################### Oot bi.ai.aem

ooor"nweikt u

e'o naseyt ns upOeekmIr

Epoch 5/5 131996/131996 [==============================] - 1987s 15ms/step - loss: 3.3934 #################### Temperature: 0.2 #################### o e o o i e e ee e o e e o e e eo
 o      o                                                 e               o        e                                                                         o             e
                                t   e  r             e o                              o
o o o o ee s e e e e e o e t o e e ee e o

#################### Temperature: 0.5 #################### e d u. e e een .tttn e ttned o so o ir e

oetoo h

eayt t aaisnte tos . mo ais yealali o ee e tioeeI ee so rlo l .a h l eo i. o aeeeie . see o oot n b ah i eie otto taeoe nee eh t a or eh o eeorite n e oeo ea oa e ithslo o er e e u e ldr a iiire ee ee ie o oee ee t 'utl.pee

#################### Temperature: 1.0 #################### tT Iehfe eps

t h uimc'iseoT aoye riewy hoaruDtyotm'rem tnn?it ltib.f bf hebi oe t .n-a iigw

d r secego o a'r sItl ls lbo'dfyH'euhabno ad o.

Seems like it goes a bit off the rails. Interestingly, I was watching it for a bit while it trained, seemed like it was increasing loss as it evaluated the text, which was odd to me.

The training document that I used is attached, not sure if it needs to follow some rule? It's around 16MB, so I figured large enough?

Thanks in advance!

quoteraw.txt

doneforaiur commented 5 years ago

https://machinelearningmastery.com/exploding-gradients-in-neural-networks

My first thought was this. I'm not entirely sure tho. In my experience training for too long would lead to gibberish or quoting dataset word by word. (Exploding gradients or overfitting.)

TylerGubala commented 5 years ago

@doneforaiur Thanks for the link! I am relatively new to the deep learning field, though I toy around with projects like this from time to time.

Would you mind correcting your hyperlink? The text is fine, but the link itself references back to the "issues" page with the tag "url".

Do you think that this is potentially an issue where training for too long should simply be avoided, or is there some instability of the model itself?

doneforaiur commented 5 years ago

As far as I know, LSTM helps with exploding gradients problem. Sadly, I'm not sure. Did you tinker with "keep probabilities"? Maybe the model's neurons count dips too low? :(

TylerGubala commented 5 years ago

@doneforaiur Sorry, I'm not sure what that means. Is that a parameter that I can feed into the train_from_file function?

doneforaiur commented 5 years ago

https://github.com/karpathy/char-rnn#best-models-strategy

I meant "dropout". I'm sure it's implemented in textgenrnn too.

doneforaiur commented 5 years ago

Ahhhh, I think I got it. The learning rate drops too low when the current epoch number is too high. I think it might be something to do with that.

textgenrnn.py -> 214th line

TylerGubala commented 5 years ago

# scheduler function must be defined inline.
def lr_linear_decay(epoch):
    return (base_lr * (1 - (epoch / num_epochs)))

That is quite strange, I guess I'll try looking around and seeing how others do learning rates.

craggles17 commented 5 years ago

I have the same issue, how do we set the learning rate to be static?

craggles17 commented 5 years ago

Or even change the base learning rate?

ghost commented 5 years ago

Or even change the base learning rate?

I guess you can just alter the function. Alas, as Max himself has said, tinkering with the base lr doesn't necessarily bring about any improvement :/.

breadbrowser commented 2 years ago

more text then needed more loss

breadbrowser commented 2 years ago

more training then needed more loss

minimaxir / textgenrnn

More training = Higher Loss #120