openai / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
https://openai.com/blog/better-language-models/
Other
22.57k stars 5.53k forks source link

GPT-2 on Wordpress #284

Open raitman005 opened 3 years ago

raitman005 commented 3 years ago

As i've wondering is it possible to make a GPT-2 work inside the wordpress?

DaveXanatos commented 3 years ago

Sure. I've got it running on a raspberry pi providing some commentary for my robot to speak. If you have full access & perms to your server, and your server supports scripting in python, and you have the space to host the model you will use, and the processor speed is fast enough - then yes. You should be able to embed gpt2 into almost anything. You could even use it as an email autoresponder if you are adventurous.

raitman005 commented 3 years ago

Will you provide me some of the articles i can read about this matter? I don't have any leads how to start. Thank you.

DaveXanatos commented 3 years ago

Before I get into embedding gpt2, do you have a working, standalone implementation of gpt2 running yet? For most people, that's the hurdle to overcome. Getting your version of Tensorflow, Python, and GPT-2 to all work together is first. If you have that running, let me know. Once that's up, embedding it into other stuff is very simple scripting. Do you know Python well?

raitman005 commented 3 years ago

I have a working implementation of gpt-2 running on my localhost using the flask via cmd.

DaveXanatos commented 3 years ago

Sweet - then integrating GPT-2 with almost anything else should be a breeze. Below is a script I wrote (edited heavily from the original interact_conditional script) that takes a prompt and prints the output as well as piping it via ZeroMQ to my SpeechCenter which speaks the output. As long as you can script your wordpress page to take a user input and pipe it to your GPT-2 script, and then pipe the output back to the page (probably by some sort of HTTPrequest thing) it should be easy...

Take a look at my code, see how the input is set up, replace that with a var that comes from the web page. If you're not familiar with ZeroMQ messaging, you'll thank yourself for the rest of your life for learning it for how easy it makes interscript message passing, and it allows you to not have to stuff everything into one big "CGI" script lol. Keeping your parts separate makes things much easier.

Here's the code - this worked with GPT-2 and both TF 1.13.1 and (after running the compatibility script and replacing hparams in the GPT-2 folders) it now runs on TF 2.2. Good luck!

NOTE For some reason the code block refuses to start at the beginning, so watch the indenting on the def interact_model() function!

`

!/usr/bin/env python3

import fire import json import os import numpy as np import tensorflow as tf import zmq import time

startTime = time.time() context = zmq.Context()

print("Connecting to Speech Center") socket = context.socket(zmq.REQ) socket.connect("tcp://localhost:5555") #5556 is visual, 5558 is language, 5554 is Motor

import model, sample, encoder

def interact_model( model_name='774M', #345M on Pi4B4 or 8 only (memory allocation issue) 1558 too big for Pi4b8G reliably. seed=None, nsamples=1, batch_size=1, length=140, temperature=1.2, top_k=48, top_p=0.7, models_dir='models', ):

models_dir = "/home/pi/Desktop/HOSTCORE/gpt-2/models"
if batch_size is None:
    batch_size = 1
assert nsamples % batch_size == 0

enc = encoder.get_encoder(model_name, models_dir)
hparams = model.default_hparams()
with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:
    hparams.override_from_dict(json.load(f))

if length is None:
    length = hparams.n_ctx // 2
elif length > hparams.n_ctx:
    raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

with tf.compat.v1.Session(graph=tf.Graph()) as sess:
    context = tf.compat.v1.placeholder(tf.int32, [batch_size, None])
    np.random.seed(seed)
    tf.compat.v1.set_random_seed(seed)
    output = sample.sample_sequence(
        hparams=hparams, length=length,
        context=context,
        batch_size=batch_size,
        temperature=temperature, top_k=top_k, top_p=top_p
    )

    saver = tf.compat.v1.train.Saver()
    ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))
    saver.restore(sess, ckpt)

    while True:
        executionTime = (time.time() - startTime)
        print('Time from script start: ' + str(executionTime))
        raw_text = input("\n\nModel prompt >>> ")
        while not raw_text:
            print('Prompt should not be empty!')
            raw_text = input("\n\nModel prompt >>> ")
        context_tokens = enc.encode(raw_text)
        generated = 0
        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
                context: [context_tokens for _ in range(batch_size)]
            })[:, len(context_tokens):]
            for i in range(batch_size):
                generated += 1
                text = enc.decode(out[i])
                stripFrag = text.rsplit(".", 1)
                text = stripFrag[0] + "."  #Truncates after last "." to strip sent. frags.
                killString = "<|endoftext|>"
                if killString in text:
                    keepFrag = text.split(killString)
                    text = keepFrag[0]
                print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
                print(text)
                socket.send_string(text)
                message = socket.recv()
                message = message.decode('utf-8')
                if message==text:
                    print("1")
                else:
                    print("0")
        print("=" * 80)

if name == 'main': fire.Fire(interact_model)

`