openai / generating-reviews-discovering-sentiment

Code for "Learning to Generate Reviews and Discovering Sentiment"
https://arxiv.org/abs/1704.01444
MIT License
1.51k stars 379 forks source link

Only last 64 chars? #32

Closed fedorzh closed 7 years ago

fedorzh commented 7 years ago

Is that true that using the current transform() function, we only get features for the last 64 chars of the review, rather than for all of the review? smb[:, offset+start:offset+end, :] = batch_smb seem to overwrite previous features.

yairf11 commented 7 years ago

If I understand correctly, no. From the paper:

The hidden state of the model serves as an online summary of the sequence which encodes all information the model has learned to preserve that is relevant to predicting the future bytes of the sequence.

Thus, each sequence has only one set of final states. These final states capture information from the whole sequence, so the features you get are for the entire sequence. This is why we can overwrite the previous states, as you pointed out - they carry less information than the new states.

You can also see that the new states (i.e. the ones that were found for the last 64 chars) depend on the previous states, so the information flows between batches. This is also stated in the paper:

States were initialized to zero at the beginning of each shard and persisted across updates to simulate full-backpropagation and allow for the forward propagation of information outside of a given subsequence.

Newmu commented 7 years ago

The encoder tensorflow op is hardcoded to process 64 character chunks for efficiency reasons. The code you're looking at updates the states of the model in place (they are originally all zeros) as it proceeds left to right over the whole sequence in 64 character chunks.