Closed pgrandinetti closed 4 years ago
Prob(sequence_so_far + w) -- Is this the probability of observing the sequence [sequence_so_far, w]?
text_so_far
Prob(text_so_far + w | modified model)
, in order to compute iteratively \Delta H
Prob(text_so_far + w | modified model)
you use the modified model in a standard way, e.g, model.predict(text_so_far+w)
which returns a probability.This is also why I thought you would simply (and brutally) modified the weights, so that you could run model.predict
after changing the weights. I can see now that I am missing something, can you please give more details?
- Here's my interpretation of that section in the paper.
- It's given the text generated so far:
text_so_far
- For each word in BoW your objective is to compute
Prob(text_so_far + w | modified model)
, in order to compute iteratively\Delta H
- To compute
Prob(text_so_far + w | modified model)
you use the modified model in a standard way, e.g,model.predict(text_so_far+w)
which returns a probability.This is also why I thought you would simply (and brutally) modified the weights, so that you could run
model.predict
after changing the weights. I can see now that I am missing something, can you please give more details?
i suggest you read the source code especially the function perturb_past
in run_pplm.py
(https://github.com/uber-research/PPLM/blob/dc58121277570ae85ee1e114188036f52bc37fe7/run_pplm.py#L117).
the code says clearly that before generating a next word, losses are calculated (including bow\ DISCRIM\ kl_loss), and grads are calculated and applied to the hidden state, for num_iterations
loops, which is the implementation of the 4th function in the pplm paper(https://arxiv.org/abs/1912.02164).
great thanks!
Trying to understand the attribute model in equation (4) in your paper. I have two general questions.
p(a | H_t + \Delta H_t)
.Given the modified model (which is actually my second question), you want to compute the probability that the model would generate a sequence that contains attribute
a
.Let's consider the BoW approach. For each word w in the bag, and given the current sentence
sequence_so_far
, you computeProb(sequence_so_far + w)
. Is it correct so far? How do you compute that last term? Is it likemodel.predict(sequence_so_far+w)
?I get how it's computed. Not how the model is modified in practice though. Is it something like
model.layers[i].set_weights(H[i] + DeltaH[i])
, for the specific layers corresponding to the whole H?Thanks!