Math nerd wanted for PyTorch

thadunge2 commented 4 years ago

I opened a branch for running AIDungeon on PyTorch: https://github.com/thadunge2/AIDungeon/tree/pytorch-model/generator/gpt2

It's plug-and-play, just run play.py and it should install everything it needs to (unless you're on Windows, in which case it will tell you what to do). However, it's unusably slow until we rework the generate method to use hidden past states. This is beyond my ken, so if one of you wants to step up and do it, be my guest.

Here's the generate function we use: https://github.com/huggingface/transformers/blob/ce50305e5b8c8748b81b0c8f5539a337b6a995b9/src/transformers/modeling_utils.py#L699

outputs = self(**model_inputs) needs to take a "past" parameter and change like so: outputs, pasts = self(**model_inputs) I don't have the time or knowledge to make it do this, since it turns the 3D matrix into a 2D one and fucks everything up. So drop a PR on the pytorch-model branch fixing that and we can roll this feature out.

ShnitzelKiller commented 4 years ago

I'll take a stab at it. I have experience with tensor math, and also debugging tensors in Pytorch.

ShnitzelKiller commented 4 years ago

I'm getting the pytorch code up and running, and aside from having to manually run selective lines from the setup.sh (namely the part installing transformers as an editable library), play.py doesn't run because there is no function called convert_gpt2_checkpoint_to_pytorch in transformers, but rather in the sub-module transformers.convert_gpt2_original_tf_checkpoint_to_pytorch. Is this a typo or did I somehow end up with the wrong version?

ShnitzelKiller commented 4 years ago

Also here is the result when I correct that function name and try to run play.py:

INFO:filelock:Lock 139742871501624 acquired on /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71.lock
INFO:transformers.file_utils:https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-vocab.json not found in cache or force_download set to True, downloading to /homes/grail/jamesn8/.cache/torch/transformers/tmpee6b9v4i
INFO:transformers.file_utils:storing https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-vocab.json in cache at /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:transformers.file_utils:creating metadata file for /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:filelock:Lock 139742871501624 released on /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71.lock
INFO:filelock:Lock 139742887696984 acquired on /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock
INFO:transformers.file_utils:https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-merges.txt not found in cache or force_download set to True, downloading to /homes/grail/jamesn8/.cache/torch/transformers/tmp6mj2i0gq
INFO:transformers.file_utils:storing https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-merges.txt in cache at /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
INFO:transformers.file_utils:creating metadata file for /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
INFO:filelock:Lock 139742887696984 released on /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.lock
INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-vocab.json from cache at /homes/grail/jamesn8/.cache/torch/transformers/eb2d31fb18c927045d8ccc07cace8bf1c10458bf171a5ad4cb1cbe0b75773425.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-xl-merges.txt from cache at /homes/grail/jamesn8/.cache/torch/transformers/18d7ac53606f670f979f24836b00f5dfee1c58d79bdbcc58411265f194d88ac0.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
Traceback (most recent call last):
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/configuration_utils.py", line 179, in from_pretrained
    resume_download=resume_download,
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/file_utils.py", line 220, in cached_path
    raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file generator/gpt2/models/model_v5/pytorch-convert/config.json not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "play.py", line 680, in <module>
    prepare_aidungeon_2()
  File "play.py", line 676, in prepare_aidungeon_2
    play_aidungeon_2()
  File "play.py", line 212, in play_aidungeon_2
    generator = GPT2Generator(no_cuda=not cuda)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 29, in __init__
    self.model = GPT2LMHeadModel.from_pretrained(self.checkpoint_path)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 358, in from_pretrained
    **kwargs
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/configuration_utils.py", line 200, in from_pretrained
    raise EnvironmentError(msg)
OSError: Model name 'generator/gpt2/models/model_v5/pytorch-convert' was not found in model name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). We assumed 'generator/gpt2/models/model_v5/pytorch-convert/config.json' was a path or url to a configuration file named config.json or a directory containing suc
h a file but couldn't find any such file at this path or url.

Just to be clear, I have model_v5 installed in the right location, and it's what I've been using all this time. It seems to expect an existing pytorch_convert/config.json to be present? It seems the conversion script half-ran once and didn't run subsequently since the directory was present... let's try again...

ShnitzelKiller commented 4 years ago

I got this almost running, but now I'm seeing that CPU and GPU tensors are getting mixed, and causing errors... did you by any chance test this on a CPU only build?

Traceback (most recent call last):
  File "play.py", line 680, in <module>
    prepare_aidungeon_2()
  File "play.py", line 676, in prepare_aidungeon_2
    play_aidungeon_2()
  File "play.py", line 253, in play_aidungeon_2
    prompt, context=context, upload_story=upload_story
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/story/story_manager.py", line 131, in start_new_story
    block = self.generator.generate(context + story_prompt)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 98, in generate
    text = self.generate_raw(prompt)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 85, in generate_raw
    repetition_penalty = 1.15)[0].tolist()
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 692, in generate
    effective_batch_size,
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 724, in _generate_no_beam_search
    outputs = self(**model_inputs)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 580, in forward
    inputs_embeds=inputs_embeds,
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 456, in forward
    inputs_embeds = self.wte(input_ids)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

EDIT: I have now fixed this, and ran into one last error which seemed to be due to a typo in the transformers library (calling Tensor.scatter() with named argument "src" instead of "source"); correcting that fixed it. Now on to the actual task...

ShnitzelKiller commented 4 years ago

At long last I have made some headway. Here's my progress so far: https://github.com/ShnitzelKiller/transformers/tree/gpt2-past-2 First off, I added the past state back into the pipeline, making sure it's in the right format (a tuple of tensors, one for each layer, with dimension -2 the sequence dimension along which they're concatenated in time). Next I found that the Attention module was using a different method to generate the b matrix multiplied with w in self._attn(), and ported that over to pytorch (instead of the static matrix, which was causing inconsistent dimensions to crash the game).

For an overview: The unused past variable in the function you linked was supposed to be a list of n_layers tensors with dimension 2 x batch x head x sequence_n x head_features. Each of these tensors gets used in src/transformers/modeling_gpt2.py by a separate layer in Attention.forward() where it is passed in as layer_past. In that function, the tensor is split apart, processed, joined back together, and that's the new present.

I THINK the past is being handled properly now, and there's probably a bit of code that depends on something having fixed dimensions that's acting up now that I'm feeding the past through (that's what happened with the attention_mask).

Currently, it gets farther than before before crashing with this error:

Traceback (most recent call last):
  File "play.py", line 680, in <module>
    prepare_aidungeon_2()
  File "play.py", line 676, in prepare_aidungeon_2
    play_aidungeon_2()
  File "play.py", line 253, in play_aidungeon_2
    prompt, context=context, upload_story=upload_story
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/story/story_manager.py", line 131, in start_new_story
    block = self.generator.generate(context + story_prompt)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 99, in generate
    text = self.generate_raw(prompt)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/generator/gpt2/gpt2_generator.py", line 86, in generate_raw
    repetition_penalty = 1.15)[0].tolist()
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 692, in generate
    effective_batch_size,
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_utils.py", line 723, in _generate_no_beam_search
    outputs = self(**model_inputs)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 591, in forward
    inputs_embeds=inputs_embeds,
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/projects/grail/jamesn8/projects/NLP/AIDungeon/libraries/transformers/src/transformers/modeling_gpt2.py", line 468, in forward
    position_embeds = self.wpe(position_ids)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/local1/jamesn8/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range at /opt/conda/conda-bld/pytorch_1544176307774/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:191

I think it's close to working, but I've got to take a break. Hope this helps.

AccidentallyOnPurpose commented 4 years ago

Nice work. But I'm confused why not just use the hugging face past argument to? In their comments they imply that they have got hidden state caching to work. E.g. you can just take the present variable that's returned and pass that in on the next iteration (while cropping the input)?

I did try that, and it runs fine (make show you don't pass the whole input, just the next token), but no change in generation time. They do say it only speeds up decode, which in GPT2 is just the final layer I think? So perhaps we can't expect much of a speed up.

AccidentallyOnPurpose commented 4 years ago

@thadunge2 Have you compared speed on the same machine? If so how much slower, roughly, is it? And did you have pytorch gpu installed?

For me a single generation of 60 chars took 6.34 seconds on a 2080 ti.

ShnitzelKiller commented 4 years ago

What do you mean? I am using their past argument, it was just set to None in their code before with a TODO, and I've modified the spots where it is either not set or not used properly (that I've found so far). That is, _generate_no_beam_search() calls GPT2LMHeadModel.forward() through with a past argument of None, due to prepare_inputs_for_generation not setting the past argument. What was missing is code for concatenating the current past to the previous presents.

And the current issue is that the embedding, self.wpe, is getting accessed out of bounds by position_embeds in the line

position_embeds = self.wpe(position_ids)

AccidentallyOnPurpose commented 4 years ago

Oh I see. Didn't realise they had a sampling function in modeling utils. I was looking at examples.

it looks like you might have forgotton to crop input ids? You only need to pass in the tokens which have not been given past states (new ones), otherwise you get too many hidden states and you get a index error.

At least I and other people in the issues had the same error because we didn't realise that we should only pass in new tokens.

 **past**:
            list of ``torch.FloatTensor`` (one for each layer) of shape ``(2, batch_size, num_heads, sequence_length, embed_size_per_head)``:
            that contains pre-computed hidden-states (key and values in the attention blocks).
            Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
            should not be passed as input ids as they have already been computed.

As far as I can see you also don't need to concat the past and present because each new present includes the previous past - it's already done.

ShnitzelKiller commented 4 years ago

Can you be more specific? Where in the generate_no_beamsearch can the past possibly accumulate? It isn't set at all. Compare to the sample.py function sample_sequence() in the Tensorflow version. It is a while loop, where the past is set as

 next_outputs["presents"]
                if past is None
                else tf.concat([past, next_outputs["presents"]], axis=-2),

(it's a little hard to parse that this is the output for the past since they use a tf.while_loop, but it is the first element of the return tensor, which gets mapped back to past.) This snippet is analogous to the while loop in generate_no_beamsearch(), where the past just stays None. Here, outputs[1] is the equivalent of next_outputs["presents"].

AccidentallyOnPurpose commented 4 years ago

Well it's inside the model 1) it's accumulated inside the model, e.g. in attention the new hidden states and the past are concatenated.

And each time presents is returned from the model afresh and includes all hidden states for all sequences. E.g. if the context was 60 tokens, then the past if 60 tokens long. If you concatenate it with the previous cached hidden states than it's 120 tokens long. So you don't need to keep track of the past, just take each new one from the model outputs.

Also you only need to pass in new ids, since hidden states will be generated from them, and concated with the past inside each layer.

ShnitzelKiller commented 4 years ago

Hmm, in the tensorflow code, though, that same snippet also exists, concatenating the key and values to the past values. And yet they also concatenate the past in the while loop.

 with tf.variable_scope(scope):
        c = conv1d(x, "c_attn", n_state * 3)
        q, k, v = map(split_heads, tf.split(c, 3, axis=2))
        present = tf.stack([k, v], axis=1)
        if past is not None:
            pk, pv = tf.unstack(past, axis=1)
            k = tf.concat([pk, k], axis=-2)
            v = tf.concat([pv, v], axis=-2)
        a = multihead_attn(q, k, v)
        a = merge_heads(a)
        a = conv1d(a, "c_proj", n_state)
        return a, present

AccidentallyOnPurpose commented 4 years ago

To be even more clear here is the transformers issue where multiple people got confused, making what I think is a similar mistake and passing in all the input tokens https://github.com/huggingface/transformers/issues/1749

At the very bottom is a simple example of all, I think, we need to do. Only passing in next token, no accumulation of pasts in the sample function.

AccidentallyOnPurpose commented 4 years ago

I've been reading the pytorch code exclusively since I have a treasured and irrational hate for tensorflow.

AccidentallyOnPurpose commented 4 years ago

And yet they also concatenate the past in the while loop.

Where's that sorry? i mean the while loop

ShnitzelKiller commented 4 years ago

See my earlier post. It's in sample_sequence() in sample.py. It's a tf.while_loop().

ShnitzelKiller commented 4 years ago

I still get index out of range when I just make past = outputs[1]

AccidentallyOnPurpose commented 4 years ago

Oh right, but that's in the original GPT2 code from google. The transformers repo makes some changes in attempts to clean things up, not sure about this choice, but it is what it is.

AccidentallyOnPurpose commented 4 years ago

Hmm did you crop inputs as well?

thadunge2 commented 4 years ago

I got this almost running, but now I'm seeing that CPU and GPU tensors are getting mixed, and causing errors... did you by any chance test this on a CPU only build?

However did you guess?

@thadunge2 Have you compared speed on the same machine? If so how much slower, roughly, is it? And did you have pytorch gpu installed?

I can't fit the model on my GPU. It was so much slower than TF in my tests that it's completely unplayable.

ShnitzelKiller commented 4 years ago

I have only modified the transformer library functions. If the current pytorch branch of AIDungeon is missing some kind of input processing, I haven't touched it. And it is the same error as I posted earlier, the wpe indices of the embedding sampled by position_ids are out of range.

As for the model, it does fit in my GPU using just 6.8 GB of VRAM. It works when not using the past, and isn't unplayably slow for me.

AccidentallyOnPurpose commented 4 years ago

@thadunge2 I would have thought the tensorflow version was too slow on a cpu too. How long does the tensorflow version take on your cpu?

One way to get it working on a cpu, long term project, is to distill it. It's worked quite well and is something many people have used to deploy transformer models. You can get a 100x speed up for 5% less accuracy, at least on classification tasks, not sure about generation.

AccidentallyOnPurpose commented 4 years ago

If the current pytorch branch of AIDungeon is missing some kind of input processing, I haven't touched it.

Not sure what you mean?

And it is the same error as I posted earlier, the wpe indices of the embedding sampled by position_ids are out of range.

@ShnitzelKiller try cropping your inputs tho, I think that's what's giving you the error. Same as this https://github.com/huggingface/transformers/issues/1749#issuecomment-554386357 where context = next_token only.

Another example of cropping inputs in my code. I can confirm this runs with no wpe error (although I did encouter that initially, doing what your doing now)

I don't fully understand the change you made to attention tbh. Is that needed or are you following the openai tf version?

ShnitzelKiller commented 4 years ago

I am following the openai version because I got dimension mismatch errors with the current way of doing things. It was sampling from a static array instead of generating the triangular matrix, and apparently that array couldn't accommodate some sizes, creating a matrix with the wrong shape when it was subsequently multiplied w * b. I get that creating the matrix on-demand is probably less desirable than using something precomputed, though...

AccidentallyOnPurpose commented 4 years ago

Ah I see, nice!

thadunge2 commented 4 years ago

@thadunge2 I would have thought the tensorflow version was too slow on a cpu too. How long does the tensorflow version take on your cpu?

A minute or so. I never run into the 2-minute hang timeout unless it has to generate twice.

AccidentallyOnPurpose commented 4 years ago

Yeah pytorch is much slower than that on cpu for me too. I'm not sure why, I'm not sure hidden state caching will help that much, but we will see.

ShnitzelKiller commented 4 years ago

https://github.com/ShnitzelKiller/transformers/commit/b3bd8ee53a1c3081cd6f3214563924dff09dc63a is this what you had in mind for cropping?

AccidentallyOnPurpose commented 4 years ago

Yup! Let me know how that works.

ShnitzelKiller commented 4 years ago

Well, it gets stuck in a loop, I gave up after a minute of this:

Generating story...
******DEBUG******
Prompt is:  "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is:  ''
******END DEBUG******
******DEBUG******
Prompt is:  "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is:  ''
******END DEBUG******
******DEBUG******
Prompt is:  "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is:  ''
******END DEBUG******
******DEBUG******
Prompt is:  "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice
Generated result is:  ''
******END DEBUG******
******DEBUG******
Prompt is:  "You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.\nToday you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice"
Prompt: You are Galuswen, a rogue from the realm of Zurito. You have a length of rope and a cloak.
Today you decided that simple mischief isn't enough anymore. You long for a more ambitious goal: taking for yourself gold grail of Memas. You notice

AccidentallyOnPurpose commented 4 years ago

@thadunge2 if you are interested in distilliation. It's the approach huggingface is using to deploy GPT2 to mobile apps. it's teaching a smaller model to do the job of a larger model. The tradeoff's are not 100% clear but it's a decent candidate for making it faster on cpu for anons. Some links:

https://old.reddit.com/r/MachineLearning/comments/df55ij/n_test_a_distilled_gpt2s_generative_capabilities/
https://blog.floydhub.com/knowledge-distillation/
https://medium.com/huggingface/distilbert-8cf3380435b5
a distilled gpt2 (this model) in action https://transformer.huggingface.co/doc/distil-gpt2

thadunge2 commented 4 years ago

Well, it gets stuck in a loop, I gave up after a minute of this:

I got that when I was setting the length to generate_num instead of the length of the prompt + generate_num.

AccidentallyOnPurpose commented 4 years ago

Oh that's because you are returning input_ids, which you set to only 1 token.

I think you need to return all the ids.

That shows it's generating though, the problem just that your returning only the latest token.

AccidentallyOnPurpose commented 4 years ago

e.g.

generated = input_ids
... while loop...
  # whether using past or not
   generated = torch.cat([generated, tokens_to_add.unsqueeze(-1)], dim=-1)
...
return generated

new error equal good :p

AccidentallyOnPurpose commented 4 years ago

actually this might be a nicer way to write it


    def _generate_no_beam_search(
        self,
        input_ids,
        cur_len,
        max_length,
        do_sample,
        temperature,
        top_k,
        top_p,
        repetition_penalty,
        pad_token_id,
        eos_token_ids,
        batch_size,
    ):
        """ Generate sequences for each example without beam search (num_beams == 1).
            All returned sequence are generated independantly.
        """
        # current position / max lengths / length of generated sentences / unfinished sentences
        unfinished_sents = input_ids.new(batch_size).fill_(1)

        past = None

        while cur_len < max_length:
            if past is not None:
                model_inputs = self.prepare_inputs_for_generation(tokens_to_add.unsqueeze(-1), past=past)
            else:
                model_inputs = self.prepare_inputs_for_generation(input_ids, past=past)
            outputs = self(**model_inputs)
            next_token_logits = outputs[0][:, -1, :]
            past = outputs[1]

            # repetition penalty from CTRL paper (https://arxiv.org/abs/1909.05858)
            if repetition_penalty != 1.0:
                for i in range(batch_size):
                    for previous_tokens in set(input_ids[i].tolist()):
                        next_token_logits[i, previous_tokens] /= repetition_penalty

            if do_sample:
                # Temperature (higher temperature => more likely to sample low probability tokens)
                if temperature > 0 and temperature != 1.0:
                    next_token_logits = next_token_logits / temperature
                # Top-p/top-k filtering
                next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p=top_p)
                # Sample
                next_token = torch.multinomial(F.softmax(next_token_logits, dim=-1), num_samples=1).squeeze(1)
            else:
                # Greedy decoding
                next_token = torch.argmax(next_token_logits, dim=-1)

            # update generations and finished sentences
            tokens_to_add = next_token * unfinished_sents + pad_token_id * (1 - unfinished_sents)
            input_ids = torch.cat([input_ids, tokens_to_add.unsqueeze(-1)], dim=-1)
            for eos_token_id in eos_token_ids:
                unfinished_sents.mul_(tokens_to_add.ne(eos_token_id).long())
            cur_len = cur_len + 1

            # stop when there is a </s> in each sentence, or if we exceed the maximul length
            if unfinished_sents.max() == 0:
                break

        # add eos_token_ids to unfinished sentences
        if cur_len == max_length:
            input_ids[:, -1].masked_fill_(unfinished_sents.to(dtype=torch.bool), eos_token_ids[0])

        return input_ids

AccidentallyOnPurpose commented 4 years ago

Also you could subclass the GPT2 model instead of forking the transformers repo. I'll make a PR to show what I mean using past...

thadunge2 commented 4 years ago

Will that actually use the pasts?

AccidentallyOnPurpose commented 4 years ago

I dont' see why not? They are being passed into the model as past?

thadunge2 commented 4 years ago

Doesn't self.prepare_inputs_for_generation just discard them?

AccidentallyOnPurpose commented 4 years ago

Oh I was assuming we were using your custom one. Either in the transformers fork, or add it to the sublcass.

AccidentallyOnPurpose commented 4 years ago

Looping for me now too, what...

AccidentallyOnPurpose commented 4 years ago

Working! It was because it was discarding past. I'll push my fork for you to try/check if that's ok.

AccidentallyOnPurpose commented 4 years ago

https://github.com/AccidentallyOnPurpose/AIDungeon/tree/pytorch-model-aop (you don't need a transformer fork)

@ShnitzelKiller could you check this please and make sure it works and I'm not missing anything?

AccidentallyOnPurpose commented 4 years ago

Hey @thadunge2 FYI you can use this torrent for the converted model. I just checked it works in your fork. That way all pytorch forks share a torrent, speeding them all up and pooling seeds.

https://github.com/AccidentallyOnPurpose/pytorch-AIDungeon/blob/f692e39d84/generator/gpt2/models/model_v5_pytorch.torrent?raw=true

thadunge2 commented 4 years ago

There's no need for a torrent at all. You can just convert the model yourself. It doesn't take very long.

AccidentallyOnPurpose commented 4 years ago

But when this is deployed to colab you would need to

download model_v5 from torrent
install tensorflow (required to convert)
convert it
install pytorch
run it

I suggest (eventually)

download model_v5_pytorch
install pytorch
run it

ShnitzelKiller commented 4 years ago

Your fork works for me. Is it just me, or is the generation length longer now? Not that I'm complaining, I just had the duke explain the plan he hatched with his groundskeepers to get rich enough to buy an island. I guess when we merge this, we'll have to make it work with the current way of selecting from installed models, which means detecting if a model is already in Pytorch form and converting those that aren't. As for the pre-installed one, might as well include the converted one, or eventually keep the pytorch one only.

EDIT: Actually why bother with this conversion business if you need to have TF 2.0 installed just to do it, when conversion can be done elsewhere as part of deploying a new model and distributing it. People should be glad to use Pytorch only, for training too if possible.

AccidentallyOnPurpose commented 4 years ago

Cool thanks for checking it.

Hmm I wonder if I used the wrong parameter for length of something.

Yeah might as well just use the pytorch torrent, the only problem is not many people are seeding it, and my internet is very shit and people keep complaining. If you guys have torrent boxes pl

AccidentallyOnPurpose commented 4 years ago

One thing you guys might find interesting, considering the effort you've put into parsing, qoutes, full stops, etc is using new special tokens.

So here is a similar project to make a chat bot. And what they do is register special tokens and train there. So in nicks project he uses ">" for starting and action and "\n" for ending it. I think the problems there are obvious.

The alternative is to register special tokens to seperate actions and results. Here's an example repo: https://github.com/huggingface/transfer-learning-conv-ai

While this sounds good, I have found that you need a bit more training data to make the model learn how to use these special tokens.

thadunge2 commented 4 years ago

That sounds like a good project for the anon training that 774M model on the original data.

thadunge2 / AIDungeon

Math nerd wanted for PyTorch #53