minimaxir / textgenrnn

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
Other
4.94k stars 756 forks source link

How do I load models and continue training on them #74

Open sovseck3000 opened 6 years ago

sovseck3000 commented 6 years ago

Whenever I've tried to load my model it starts to train a new model. Cant find a way to load them and continue training, Is it even possible at the moment?

flocko-motion commented 6 years ago

It works for me - but only for char level networks, not word level...

example for char level:

# the first program trains a network, saves that net and exits..
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_from_file('filename_training.txt',
                                    num_epochs=5,
                                    gen_epochs=5)
textgen.save('filename_trained_network.hdf5')
exit()

# the second program loads a trained net and generates output
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.save('filename_trained_network.hdf5')
textgen.generate()

all that works great! but when i try the same with a word based network i get this error:

ValueError: Layer #1 (named "embedding"), weight <tf.Variable 'embedding_1/embeddings:0' shape=(465, 100) dtype=float32_ref> has shape (465, 100), but the saved weight has shape (2873, 100). am i doing something wrong?

xoryouyou commented 5 years ago

Same Issue here

This works fine

from textgenrnn import textgenrnn

textgen = textgenrnn(config_path="./config.json")
textgen.train_from_largetext_file('../foo.txt', num_epochs=1)

textgen.save("./foo.hdf5")

but this

from textgenrnn import textgenrnn

textgen = textgenrnn(config_path="./config.json")
textgen.load("./foo.hdf5")

textgen.generate()

Fails with:

ValueError: Layer #1 (named "embedding"), weight <tf.Variable 'embedding_1/embeddings:0' shape=(465, 100) dtype=float32_ref> has shape (465, 100), but the saved weight has shape (68, 100).

Would be great to have a way to resume training a network.

xoryouyou commented 5 years ago

What seems to work is filling all the constructor arguments and settinig the new_model flag to False

from textgenrnn import textgenrnn

textgen = textgenrnn(config_path="./textgenrnn_config.json",weights_path="./textgenrnn_weights.hdf5",vocab_path="./textgenrnn_vocab.json")
textgen.train_from_largetext_file('../data/text.txt',new_model=False,num_epochs=1)
textgen.save("./textgenrnn_weights.hdf5")
textgen.generate()
xoryouyou commented 5 years ago

@minimaxir can you confirm this is the way to continue training a network or maybe provide a minimal example how to do it ?

janelleshane commented 5 years ago

I wrote a wrapper that handles loading and saving models in the current version of textgenrnn. I still get myself tangled up with it every once in a while, but I'm finding it a much easier way to keep track of the models.

https://github.com/janelleshane/textgenrnn-wrapper

If nothing else, it has some examples on how to save/load models in textgenrnn.

Adattilio commented 5 years ago

I'm still getting this error no matter whether i use Janelle's wrapper or textgenrnn by itself. @xoryouyou , I tried your method, but I am still get the error. My code was as follows:

 from textgenrnn import textgenrnn

textgen=textgenrnn()

textgen.train_from_file('A:\Downloads\cornell_movie_dialogs_corpus\cornell movie-dialogs corpus\CurseWordSentences.txt', num_epochs=3, new_model=False)

** Copy updated output files (config, weight, vocab) into new folder "CurseModelTest" **

textgen.reset()

textgen=textgenrnn(config_path="A:/Downloads/textgenrnn-master/WeightBackups/CurseModelTest/textgenrnn_config.json", weights_path="A:/Downloads/textgenrnn-master/WeightBackups/CurseModelTest/textgenrnn_weights.hdf5",vocab_path="A:/Downloads/textgenrnn-master/WeightBackups/CurseModelTest/textgenrnn_vocab.json")

And from there i get the error when it loads the files.

Update I am able to load a model when I put "new_model=True" in the "train_from_file" command. From there i can generate text fine; however, trying to continue training on the model after loading it as such causes it to hang. I had a small file with 19 lines, 132 character sequences training, and it just sat on "Epoch 1/2" doing nothing (which is probably another issue I should post about). So, as far as I can tell, this issue resides in continuing training on the default model, and attempting to load that newly updated model. Somewhere in that process i would assume the "shape" of the NN is not recorded correctly or something. Perhaps someone can further clarify if they got past these loading issues.

Update 2 - Possibly solved After hours of close inspection, perhaps I have found the actual cause of the error. The source is the vocab file. Lets begin the explanation with how you start textgenrnn. In your command prompt, you type "python" and proceed with the "from textgenrnn import textgenrnn". Before entering python you are in a directory on your computer. For windows users like myself, this is "Windows/System32". This directory, lets call it Base-Directory, will be the place where your weights file will output after each epoch. It is also where the default, pre-trained model will put its vocab and config file when you setup textgenrnn initially (Note: you can change this location by going to your desired directory in the command prompt before starting python). Now, if you, say, save a weights file manually , it also ends up here. The unfortunate thing is , and this is important, IF you ever run a command with "new_model=True", and you are in your Base-Directory, if you do not give that new model a name in your command, it will OVERWRITE the default vocab file which has 465 key/value pairs as noted in the errors we receive. The other value in the error that's not 100 (in @xoryouyou case it would be 68) is the newly trained model's vocab! So, the default model is looking for its vocab in the one place it was created, and its vocab set is no longer the 465 key/value pairs it was, but the key/value pairs of the new model. There are a few ways to fix this however. You could download the repo zip and find the vocab file in there or you could find the subdirectory in your python folder that contains the file (my windows path for this directory was "C:\Users[your user name]\AppData\Local\Programs\Python\Python36\Lib\site-packages\textgenrnn"). Once you find the file, copy it back into the Base-Directory and overwrite the existing vocab file. Now I am unsure what would be done if the config file in the Base-Directory was also overwritten by your new model as its not in the python library directory nor in the repo; however, i would guess that the config might be created and updated in the Base-Directory when textgenrnn is initialized to the variable "textgen". So, I'm hoping this answer solves the issue a few of us are seeing, but there is another issue with saving/loading I should briefly mention.

If you try to use the pretrained model for transfer learning with your own text files, it will work while the model is alive in your command prompt; however, if you save the model (by either copying the latest output weight file in your Base-Directory or using the textgen.save functionality), reset textgen, then load the model along with the original vocab and config files like @xoryouyou posted, textgen = textgenrnn(config_path="./textgenrnn_config.json",weights_path="./textgenrnn_weights.hdf5",vocab_path="./textgenrnn_vocab.json") , then when you generate, each and every character will have a space between the ones next to it. The output will likely be words, but again, there are spaces between the characters. I double checked to ensure it wasn't just me accidently training word_level on top of the pretrained character-level model (which does work within the lifespan of the textgen generator), but it happened when word_level was False and when it was True (for further clarification, i also tried it without entering the word_level variable, thus defaulting to False).

I will post a ticket for the above issue until I or someone else can solve it (perhaps its just something on my computer). I did find another issue regarding transfer learning freezing with a new model, but at the moment I am no longer experiencing that, and will make a ticket if encountered again 👍