pretrained model? - Githubissues

dribnet commented 7 years ago

Looking for the pretrained model as the sampling notebook indicates I can "Look on this project's Github page for instructions on how to download a pretrained model." Is this available somewhere?

I'd like to try to replicate your results on sampling. I've trained my own model, but my results thus far are lackluster:

notebook screenshot

I'm guessing that I might need to train with different hyperparamaters (I used the run.py defaults), and having a model that worked would help me figure out what's going on.

greydanus commented 7 years ago

Ok, I just put several trained models in my public directory. You can wget them with

wget http://caligari.dartmouth.edu/~sgreydan/scribe/models/$FILENAME

where filename could be any of: [checkpoint model.ckpt-129000 model.ckpt-129000.meta model.ckpt-40000 model.ckpt-40000.meta model.ckpt-66500 model.ckpt-66500.meta]

Make sure you set your model_checkpoint_path: “model.ckpt-XXXX” where ‘XXXX’ is the checkpoint you want to load.

I can’t remember which of these trained models is the best. Also, the "You know nothing Jon Snow" example was generated before I made some significant edits to the code (renaming scopes, etc.) so that trained model won't work. However, you should get fairly legible results with some of these.

dribnet commented 7 years ago

Thanks for uploading this. I was able to download all the files successfully, and updated model_checkpoint_path in the checkpoint file as suggested. However, I get errors when loading the checkpoints:

attempt to load saved model...
W tensorflow/core/framework/op_kernel.cc:975] Not found: Tensor name "cell1/LSTMCell/W_0" not found in checkpoint files fetched/model.ckpt-66500
         [[Node: save/RestoreV2_12 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_12/tensor_names, save/RestoreV2_12/shape_and_slices)]]
no saved model to load. starting new session

My guess is some subtle difference in the code or tensorflow from your version. But this is no longer an issue for me as I'm now able to train my own model and sample it with reasonable results as shown. So feel free to close for now or I'm happy to try other checkpoints if you'd like to debug further.

iter-61500-l-you_know_n

iter-61500-g-you_know_n

iter-61500-w-you_know_n

greydanus commented 7 years ago

@dribnet I'm glad it's working. Again, I'm interested to see what you come up with. I'm currently experimenting with my dnc implementation (https://github.com/greydanus/dnc) to see if I can use it to generate handwriting.

mxzhao commented 7 years ago

It seems that It can't work on tensorflow 1.0.0. @greydanus

greydanus commented 7 years ago

It does now @mxzhao ; I just merged

greydanus commented 7 years ago

@dribnet Can I have the model that you most recently trained? We know for sure that it works with the merged version. Just put it in your public folder and I can wget it - thanks!

dribnet commented 7 years ago

Sure - I now have have both a TF 1.0 and 0.2 model trained locally. Let me dig up the 1.0 and verify it is working and I can try to make it available.

dribnet commented 7 years ago

I found a model that works, but in the process discovered that I had pushed a handful of breaking changes to the branch that got merged in. Pull request #4 gets master back to a stable state and also has details on how to download and use the pre-trained model.

slcott commented 7 years ago

Sam, I've been trying to replicate your results after reading through your tutorial.

I can't seem to load the checkpoint file. I'm using python3 and tf 1.0 btw, and converted your files using 2to3 which does not seem to be the source of any error.

I downloaded the checkpoint files you provided:

Scotts-MBP:scribe slcott$ 
Scotts-MBP:scribe slcott$ 
Scotts-MBP:scribe slcott$ ls models
checkpoint      model.ckpt-129000   train_scribe.txt
checkpoint.BACKUP   model.ckpt-129000.meta
Scotts-MBP:scribe slcott$ 
Scotts-MBP:scribe slcott$ more models/checkpoint
model_checkpoint_path: "model.ckpt-129000"
all_model_checkpoint_paths: "model.ckpt-129000"
Scotts-MBP:scribe slcott$ 
Scotts-MBP:scribe slcott$

.restore() is the initial source of the error.

            self.saver.restore(self.sess, './models/model.ckpt-129000')
        except Exception as e:
            print(e)
            self.logger.write("no saved model to load. starting new session")
            load_was_success = False

Do you see what might be the root cause of my error?

Tensor name "cell0/lstm_cell/biases" not found in checkpoint files ./models/model.ckpt-129000
     [[Node: save/RestoreV2_3 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_3/tensor_names, save/RestoreV2_3/shape_and_slices)]]

Caused by op 'save/RestoreV2_3', defined at:
  File "run.py", line 164, in <module>
    main()
  File "run.py", line 65, in main
    train_model(args) if args.train else sample_model(args)
  File "run.py", line 136, in sample_model
    model = Model(args, logger)
  File "/Users/slcott/Desktop/scribe/model.py", line 213, in __init__
    self.saver = tf.train.Saver(tf.global_variables())
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1051, in __init__
    self.build()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1081, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 675, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 402, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 242, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 668, in restore_v2
    dtypes=dtypes, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Tensor name "cell0/lstm_cell/biases" not found in checkpoint files ./models/model.ckpt-129000
     [[Node: save/RestoreV2_3 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_3/tensor_names, save/RestoreV2_3/shape_and_slices)]]

no saved model to load. starting new session
load failed, sampling canceled
^CTraceback (most recent call last):
  File "run.py", line 164, in <module>
    main()
  File "run.py", line 65, in main
    train_model(args) if args.train else sample_model(args)
  File "run.py", line 160, in sample_model
    time.sleep(args.sleep_time)
KeyboardInterrupt
Scotts-MBP:scribe slcott$

greydanus commented 7 years ago

Thanks for the trained TF 1.0 model @dribnet. @slcott I just added a "Getting started" section to the README which includes instructions for downloading a pretrained model.

greydanus commented 7 years ago

@mxzhao latest commit is now fully compatible with tf 1.0 and includes a pretrained model

greydanus / scribe

pretrained model? #2