Open prokopevaleksey opened 7 years ago
Hi:
Having difficulty getting penseur to work. Running neural_storyteller and neural_rewriter with recommended models and UofT Book corpus on ubuntu 16.04 and GTX 1070 gpu
Penseur set up as per instructions but when I try to encode text file 'X' I receive the following error:
import skipthoughts
Using gpu device 0: GeForce GTX 1070 Loading model parameters... Compiling encoders... Loading tables... Packing up...
vectors = skipthoughts.encode(model, X) Traceback (most recent call last): File "
", line 1, in NameError: name 'X' is not defined
I would like to generate similar model sets (e.g. adventure) for neural_storyteller and neural_rewriter.
romance_dictionary.pkl romance.npz romance.npz.pkl romance_style.npy
https://github.com/pshah123/neural-rewriter
Any help is greatly appreciated.
Cheers
You have to read the text in before it can be encoded. In the example on the project page, it shows an example.
Thanks for the quick reply.
I have most of your penseur examples working now based on the project page.. What I'm trying to do is read an entire text file in through penseur then creating a dictionary and encoder from X.
I'm hoping the two dictionary files produced are compatible with skip-thoughts decoding and training.
Is it possible to read an entire text file into penseur? I would like to open 100+ meg text files for processing as per the original 'bookcorpus' examples.
After loading penseur.py I've tried a variation of the 'f=open...' below but it throws errors.
I would prefer solving this problem through penseur if that's possible.
The code below runs in ipython (2.7) as an imported script until the last ValueError at the end :(
import nltk import vocab import train
model = skipthoughts.load_model() encoder = skipthoughts.Encoder(model)
f = open('sample1.txt','r') sentences = [nltk.word_tokenize(r.lower()) for r in f.readlines()] X = [' '.join(s) for s in sentences]
f = open('sample2.txt','r') sentences2 = [nltk.word_tokenize(r.lower()) for r in f.readlines()] C = [' '.join(s) for s in sentences2]
worddict, wordcount = vocab.build_dictionary(X)
print len(worddict)
vocab.save_dictionary(worddict, wordcount, '/home/pixelhead/Desktop/neural-rewriter/skip-thoughts-master/decoding/dict/sample1.pkl') * see
train.trainer(X,C,model)
Loads models, trainer parameters, begins training then ...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Needed to add 'sample1.pkl' to create the pickle. I would appreciate if you can suggest code that will write both dictionary files to the 'dict' folder.
Cheers,
Aaron
If the 'open' command is throwing errors, that has nothing to do with Penseur. If you're having trouble saving things, know that pickle has a hard time saving dictionaries. Again, it is evident that the problems you're having are not related to the Penseur codebase. Please use StackOverflow to answer your questions. Rarely will you find an issue that hasn't been addressed already on that site. Good luck!
Have it sorted out now using penseur commands in a python script with:
f = open('genre_corpus.txt','r') sentences = [(r.lower())for r in f.readlines()]
Generates a decoder file that loads in Neural Storyteller/Rewriter but the quality and length of output is less than the default "romance.npz"
From what I've read on the forum the encoding and decoding training is not exactly the same as the processes used to generate the 'BookCorpus' model and 'romance.npz' decoder. Examining the generated .npz archives shows two missing files (hidden state?) compared to the 'romance.npz' decoder. The file size is also 2-3 times larger although the corpus used was only 150 megs compared to the 2+ gigs used for 'romance.npz.' Weird.
Guess this is the "lite" version compared to version described in paper. Too bad.
Everything else working fine in skip-thoughts/penseur.
Last thing to figure out is how to generate longer stories based on ryan kiros' advice:
"length (if you bias by really long passages, it will decode really long stories)"
Have changed many values including k= and beam width. The content varies somewhat but the length (100-130 words) remains the same.
Cheers
I don't know of any other pre-trained decoders, but you can use the code to generate your own. Alternatively, I've written a simplified version here: https://github.com/danielricks/penseur