Open thadunge2 opened 4 years ago
I'll take a stab at it. I have experience with tensor math, and also debugging tensors in Pytorch.
I'm getting the pytorch code up and running, and aside from having to manually run selective lines from the setup.sh (namely the part installing transformers as an editable library), play.py
doesn't run because there is no function called convert_gpt2_checkpoint_to_pytorch
in transformers
, but rather in the sub-module transformers.convert_gpt2_original_tf_checkpoint_to_pytorch
. Is this a typo or did I somehow end up with the wrong version?
Also here is the result when I correct that function name and try to run play.py:
INFO:filelock:Lock 139742871501624 acquired on /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71.lock
INFO:transformers.file_utils:https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-vocab.json not found in cache or force_download set to True, downloading to /homes/grail/jamesn8/.cache/torch/transformers/tmpee6b9v4i
INFO:transformers.file_utils:storing https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-vocab.json in cache at /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:transformers.file_utils:creating metadata file for /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:filelock:Lock 139742871501624 released on /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71.lock
INFO:filelock:Lock 139742887696984 acquired on /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock
INFO:transformers.file_utils:https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-merges.txt not found in cache or force_download set to True, downloading to /homes/grail/jamesn8/.cache/torch/transformers/tmp6mj2i0gq
INFO:transformers.file_utils:storing https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-merges.txt in cache at /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
INFO:transformers.file_utils:creating metadata file for /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
INFO:filelock:Lock 139742887696984 released on /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock
INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-vocab.json from cache at /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-merges.txt from cache at /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
Traceback (most recent call last):
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/configuration_utils.py", line 179, in from_pretrained
resume_download=resume_download,
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/file_utils.py", line 220, in cached_path
raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file generator/gpt2/models/model_v5/pytorch-convert/config.json not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "play.py", line 680, in <module>
prepare_aidungeon_2()
File "play.py", line 676, in prepare_aidungeon_2
play_aidungeon_2()
File "play.py", line 212, in play_aidungeon_2
generator = GPT2Generator(no_cuda=not cuda)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 29, in __init__
self.model = GPT2LMHeadModel.from_pretrained(self.checkpoint_path)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 358, in from_pretrained
**kwargs
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/configuration_utils.py", line 200, in from_pretrained
raise EnvironmentError(msg)
OSError: Model name 'generator/gpt2/models/model_v5/pytorch-convert' was not found in model name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). We assumed 'generator/gpt2/models/model_v5/pytorch-convert/config.json' was a path or url to a configuration file named config.json or a directory containing suc
h a file but couldn't find any such file at this path or url.
Just to be clear, I have model_v5 installed in the right location, and it's what I've been using all this time. It seems to expect an existing pytorch_convert/config.json to be present? It seems the conversion script half-ran once and didn't run subsequently since the directory was present... let's try again...
I got this almost running, but now I'm seeing that CPU and GPU tensors are getting mixed, and causing errors... did you by any chance test this on a CPU only build?
Traceback (most recent call last):
File "play.py", line 680, in <module>
prepare_aidungeon_2()
File "play.py", line 676, in prepare_aidungeon_2
play_aidungeon_2()
File "play.py", line 253, in play_aidungeon_2
prompt, context=context, upload_story=upload_story
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/story/story_manager.py", line 131, in start_new_story
block = self.generator.generate(context + story_prompt)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 98, in generate
text = self.generate_raw(prompt)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 85, in generate_raw
repetition_penalty = 1.15)[0].tolist()
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
return func(*args, **kwargs)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 692, in generate
effective_batch_size,
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 724, in _generate_no_beam_search
outputs = self(**model_inputs)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 580, in forward
inputs_embeds=inputs_embeds,
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 456, in forward
inputs_embeds = self.wte(input_ids)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'
EDIT: I have now fixed this, and ran into one last error which seemed to be due to a typo in the transformers library (calling Tensor.scatter() with named argument "src" instead of "source"); correcting that fixed it. Now on to the actual task...
At long last I have made some headway. Here's my progress so far:
https://github.com/ShnitzelKiller/transformers/tree/gpt2-past-2
First off, I added the past state back into the pipeline, making sure it's in the right format (a tuple of tensors, one for each layer, with dimension -2 the sequence dimension along which they're concatenated in time). Next I found that the Attention module was using a different method to generate the b
matrix multiplied with w
in self._attn()
, and ported that over to pytorch (instead of the static matrix, which was causing inconsistent dimensions to crash the game).
For an overview: The unused past
variable in the function you linked was supposed to be a list of n_layers
tensors with dimension 2 x batch x head x sequence_n x head_features
. Each of these tensors gets used in src/transformers/modeling_gpt2.py
by a separate layer in Attention.forward() where it is passed in as layer_past
. In that function, the tensor is split apart, processed, joined back together, and that's the new present.
I THINK the past is being handled properly now, and there's probably a bit of code that depends on something having fixed dimensions that's acting up now that I'm feeding the past through (that's what happened with the attention_mask).
Currently, it gets farther than before before crashing with this error:
Traceback (most recent call last):
File "play.py", line 680, in <module>
prepare_aidungeon_2()
File "play.py", line 676, in prepare_aidungeon_2
play_aidungeon_2()
File "play.py", line 253, in play_aidungeon_2
prompt, context=context, upload_story=upload_story
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/story/story_manager.py", line 131, in start_new_story
block = self.generator.generate(context + story_prompt)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 99, in generate
text = self.generate_raw(prompt)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 86, in generate_raw
repetition_penalty = 1.15)[0].tolist()
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
return func(*args, **kwargs)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 692, in generate
effective_batch_size,
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 723, in _generate_no_beam_search
outputs = self(**model_inputs)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 591, in forward
inputs_embeds=inputs_embeds,
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 468, in forward
position_embeds = self.wpe(position_ids)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191
I think it's close to working, but I've got to take a break. Hope this helps.
Nice work. But I'm confused why not just use the hugging face past
argument to? In their comments they imply that they have got hidden state caching to work. E.g. you can just take the present variable that's returned and pass that in on the next iteration (while cropping the input)?
I did try that, and it runs fine (make show you don't pass the whole input, just the next token), but no change in generation time. They do say it only speeds up decode, which in GPT2 is just the final layer I think? So perhaps we can't expect much of a speed up.
@thadunge2 Have you compared speed on the same machine? If so how much slower, roughly, is it? And did you have pytorch gpu installed?
For me a single generation of 60 chars took 6.34 seconds on a 2080 ti.
What do you mean? I am using their past
argument, it was just set to None in their code before with a TODO, and I've modified the spots where it is either not set or not used properly (that I've found so far).
That is, _generate_no_beam_search()
calls GPT2LMHeadModel.forward() through with a past argument of None, due to prepare_inputs_for_generation
not setting the past argument. What was missing is code for concatenating the current past to the previous presents.
And the current issue is that the embedding, self.wpe
, is getting accessed out of bounds by position_embeds
in the line
position_embeds = self.wpe(position_ids)
Oh I see. Didn't realise they had a sampling function in modeling utils. I was looking at examples.
it looks like you might have forgotton to crop input ids? You only need to pass in the tokens which have not been given past states (new ones), otherwise you get too many hidden states and you get a index error.
At least I and other people in the issues had the same error because we didn't realise that we should only pass in new tokens.
**past**:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
should not be passed as input ids as they have already been computed.
As far as I can see you also don't need to concat the past and present because each new present includes the previous past - it's already done.
Can you be more specific? Where in the generate_no_beamsearch can the past possibly accumulate? It isn't set at all. Compare to the sample.py
function sample_sequence()
in the Tensorflow version. It is a while loop, where the past is set as
next_outputs["presents"]
if past is None
else tf.concat([past, next_outputs["presents"]], axis=-2),
(it's a little hard to parse that this is the output for the past since they use a tf.while_loop
, but it is the first element of the return tensor, which gets mapped back to past
.)
This snippet is analogous to the while loop in generate_no_beamsearch(), where the past just stays None. Here, outputs[1]
is the equivalent of next_outputs["presents"]
.
Well it's inside the model 1) it's accumulated inside the model, e.g. in attention the new hidden states and the past are concatenated.
And each time presents is returned from the model afresh and includes all hidden states for all sequences. E.g. if the context was 60 tokens, then the past if 60 tokens long. If you concatenate it with the previous cached hidden states than it's 120 tokens long. So you don't need to keep track of the past, just take each new one from the model outputs.
Also you only need to pass in new ids, since hidden states will be generated from them, and concated with the past inside each layer.
Hmm, in the tensorflow code, though, that same snippet also exists, concatenating the key and values to the past values. And yet they also concatenate the past in the while loop.
with tf.variable_scope(scope):
c = conv1d(x, "c_attn", n_state * 3)
q, k, v = map(split_heads, tf.split(c, 3, axis=2))
present = tf.stack([k, v], axis=1)
if past is not None:
pk, pv = tf.unstack(past, axis=1)
k = tf.concat([pk, k], axis=-2)
v = tf.concat([pv, v], axis=-2)
a = multihead_attn(q, k, v)
a = merge_heads(a)
a = conv1d(a, "c_proj", n_state)
return a, present
To be even more clear here is the transformers issue where multiple people got confused, making what I think is a similar mistake and passing in all the input tokens https://github.com/huggingface/transformers/issues/1749
At the very bottom is a simple example of all, I think, we need to do. Only passing in next token, no accumulation of pasts in the sample function.
I've been reading the pytorch code exclusively since I have a treasured and irrational hate for tensorflow.
And yet they also concatenate the past in the while loop.
Where's that sorry? i mean the while loop
See my earlier post. It's in sample_sequence()
in sample.py
. It's a tf.while_loop().
I still get index out of range when I just make past = outputs[1]
Oh right, but that's in the original GPT2 code from google. The transformers repo makes some changes in attempts to clean things up, not sure about this choice, but it is what it is.
Hmm did you crop inputs as well?
I got this almost running, but now I'm seeing that CPU and GPU tensors are getting mixed, and causing errors... did you by any chance test this on a CPU only build?
However did you guess?
@thadunge2 Have you compared speed on the same machine? If so how much slower, roughly, is it? And did you have pytorch gpu installed?
I can't fit the model on my GPU. It was so much slower than TF in my tests that it's completely unplayable.
I have only modified the transformer library functions. If the current pytorch branch of AIDungeon is missing some kind of input processing, I haven't touched it. And it is the same error as I posted earlier, the wpe
indices of the embedding sampled by position_ids
are out of range.
As for the model, it does fit in my GPU using just 6.8 GB of VRAM. It works when not using the past, and isn't unplayably slow for me.
@thadunge2 I would have thought the tensorflow version was too slow on a cpu too. How long does the tensorflow version take on your cpu?
One way to get it working on a cpu, long term project, is to distill it. It's worked quite well and is something many people have used to deploy transformer models. You can get a 100x speed up for 5% less accuracy, at least on classification tasks, not sure about generation.
If the current pytorch branch of AIDungeon is missing some kind of input processing, I haven't touched it.
Not sure what you mean?
And it is the same error as I posted earlier, the wpe indices of the embedding sampled by position_ids are out of range.
@ShnitzelKiller try cropping your inputs tho, I think that's what's giving you the error. Same as this https://github.com/huggingface/transformers/issues/1749#issuecomment-554386357 where context = next_token
only.
Another example of cropping inputs in my code. I can confirm this runs with no wpe error (although I did encouter that initially, doing what your doing now)
I don't fully understand the change you made to attention tbh. Is that needed or are you following the openai tf version?
I am following the openai version because I got dimension mismatch errors with the current way of doing things. It was sampling from a static array instead of generating the triangular matrix, and apparently that array couldn't accommodate some sizes, creating a matrix with the wrong shape when it was subsequently multiplied w * b
. I get that creating the matrix on-demand is probably less desirable than using something precomputed, though...
Ah I see, nice!
@thadunge2 I would have thought the tensorflow version was too slow on a cpu too. How long does the tensorflow version take on your cpu?
A minute or so. I never run into the 2-minute hang timeout unless it has to generate twice.
Yeah pytorch is much slower than that on cpu for me too. I'm not sure why, I'm not sure hidden state caching will help that much, but we will see.
https://github.com/ShnitzelKiller/transformers/commit/b3bd8ee53a1c3081cd6f3214563924dff09dc63a is this what you had in mind for cropping?
Yup! Let me know how that works.
Well, it gets stuck in a loop, I gave up after a minute of this:
Generating story...
******DEBUG******
Prompt is: "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is: ''
******END DEBUG******
******DEBUG******
Prompt is: "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is: ''
******END DEBUG******
******DEBUG******
Prompt is: "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is: ''
******END DEBUG******
******DEBUG******
Prompt is: "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is: ''
******END DEBUG******
******DEBUG******
Prompt is: "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
@thadunge2 if you are interested in distilliation. It's the approach huggingface is using to deploy GPT2 to mobile apps. it's teaching a smaller model to do the job of a larger model. The tradeoff's are not 100% clear but it's a decent candidate for making it faster on cpu for anons. Some links:
Well, it gets stuck in a loop, I gave up after a minute of this:
I got that when I was setting the length to generate_num instead of the length of the prompt + generate_num.
Oh that's because you are returning input_ids, which you set to only 1 token.
I think you need to return all the ids.
That shows it's generating though, the problem just that your returning only the latest token.
e.g.
generated = input_ids
... while loop...
# whether using past or not
generated = torch.cat([generated, tokens_to_add.unsqueeze(-1)], dim=-1)
...
return generated
new error equal good :p
actually this might be a nicer way to write it
def _generate_no_beam_search(
self,
input_ids,
cur_len,
max_length,
do_sample,
temperature,
top_k,
top_p,
repetition_penalty,
pad_token_id,
eos_token_ids,
batch_size,
):
""" Generate sequences for each example without beam search (num_beams == 1).
All returned sequence are generated independantly.
"""
# current position / max lengths / length of generated sentences / unfinished sentences
unfinished_sents = input_ids.new(batch_size).fill_(1)
past = None
while cur_len < max_length:
if past is not None:
model_inputs = self.prepare_inputs_for_generation(tokens_to_add.unsqueeze(-1), past=past)
else:
model_inputs = self.prepare_inputs_for_generation(input_ids, past=past)
outputs = self(**model_inputs)
next_token_logits = outputs[0][:, -1, :]
past = outputs[1]
# repetition penalty from CTRL paper (https://arxiv.org/abs/1909.05858)
if repetition_penalty != 1.0:
for i in range(batch_size):
for previous_tokens in set(input_ids[i].tolist()):
next_token_logits[i, previous_tokens] /= repetition_penalty
if do_sample:
# Temperature (higher temperature => more likely to sample low probability tokens)
if temperature > 0 and temperature != 1.0:
next_token_logits = next_token_logits / temperature
# Top-p/top-k filtering
next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p=top_p)
# Sample
next_token = torch.multinomial(F.softmax(next_token_logits, dim=-1), num_samples=1).squeeze(1)
else:
# Greedy decoding
next_token = torch.argmax(next_token_logits, dim=-1)
# update generations and finished sentences
tokens_to_add = next_token * unfinished_sents + pad_token_id * (1 - unfinished_sents)
input_ids = torch.cat([input_ids, tokens_to_add.unsqueeze(-1)], dim=-1)
for eos_token_id in eos_token_ids:
unfinished_sents.mul_(tokens_to_add.ne(eos_token_id).long())
cur_len = cur_len + 1
# stop when there is a </s> in each sentence, or if we exceed the maximul length
if unfinished_sents.max() == 0:
break
# add eos_token_ids to unfinished sentences
if cur_len == max_length:
input_ids[:, -1].masked_fill_(unfinished_sents.to(dtype=torch.bool), eos_token_ids[0])
return input_ids
Also you could subclass the GPT2 model instead of forking the transformers repo. I'll make a PR to show what I mean using past...
Will that actually use the pasts?
I dont' see why not? They are being passed into the model as past?
Doesn't self.prepare_inputs_for_generation just discard them?
Oh I was assuming we were using your custom one. Either in the transformers fork, or add it to the sublcass.
Looping for me now too, what...
Working! It was because it was discarding past. I'll push my fork for you to try/check if that's ok.
https://github.com/AccidentallyOnPurpose/AIDungeon/tree/pytorch-model-aop (you don't need a transformer fork)
@ShnitzelKiller could you check this please and make sure it works and I'm not missing anything?
Hey @thadunge2 FYI you can use this torrent for the converted model. I just checked it works in your fork. That way all pytorch forks share a torrent, speeding them all up and pooling seeds.
There's no need for a torrent at all. You can just convert the model yourself. It doesn't take very long.
But when this is deployed to colab you would need to
I suggest (eventually)
Your fork works for me. Is it just me, or is the generation length longer now? Not that I'm complaining, I just had the duke explain the plan he hatched with his groundskeepers to get rich enough to buy an island. I guess when we merge this, we'll have to make it work with the current way of selecting from installed models, which means detecting if a model is already in Pytorch form and converting those that aren't. As for the pre-installed one, might as well include the converted one, or eventually keep the pytorch one only.
EDIT: Actually why bother with this conversion business if you need to have TF 2.0 installed just to do it, when conversion can be done elsewhere as part of deploying a new model and distributing it. People should be glad to use Pytorch only, for training too if possible.
Cool thanks for checking it.
Hmm I wonder if I used the wrong parameter for length of something.
Yeah might as well just use the pytorch torrent, the only problem is not many people are seeding it, and my internet is very shit and people keep complaining. If you guys have torrent boxes pl
One thing you guys might find interesting, considering the effort you've put into parsing, qoutes, full stops, etc is using new special tokens.
So here is a similar project to make a chat bot. And what they do is register special tokens and train there. So in nicks project he uses ">" for starting and action and "\n" for ending it. I think the problems there are obvious.
The alternative is to register special tokens to seperate actions and results. Here's an example repo: https://github.com/huggingface/transfer-learning-conv-ai
While this sounds good, I have found that you need a bit more training data to make the model learn how to use these special tokens.
That sounds like a good project for the anon training that 774M model on the original data.
I opened a branch for running AIDungeon on PyTorch: https://github.com/thadunge2/AIDungeon/tree/pytorch-model/generator/gpt2
It's plug-and-play, just run play.py and it should install everything it needs to (unless you're on Windows, in which case it will tell you what to do). However, it's unusably slow until we rework the generate method to use hidden past states. This is beyond my ken, so if one of you wants to step up and do it, be my guest.
Here's the generate function we use: https://github.com/huggingface/transformers/blob/ce50305e5b8c8748b81b0c8f5539a337b6a995b9/src/transformers/modeling_utils.py#L699
outputs = self(**model_inputs)
needs to take a "past" parameter and change like so:outputs, pasts = self(**model_inputs)
I don't have the time or knowledge to make it do this, since it turns the 3D matrix into a 2D one and fucks everything up. So drop a PR on the pytorch-model branch fixing that and we can roll this feature out.