lopuhin / transformer-lm

Transformer language model (GPT-2) with sentencepiece tokenizer
164 stars 47 forks source link

I would like a longer text result #31

Closed r23 closed 3 years ago

r23 commented 3 years ago

Hello,

Thank you very much for the wonderful project. It works great! I just have no question about the text length.

I use the following script From https://github.com/lopuhin/transformer-lm/issues/2#issuecomment-511742139


#!/usr/bin/env python
# -*- coding: utf-8 -*-

from pathlib import Path
from lm import inference

import numpy as np

MODEL_PATH = Path('/..../pytorch_models/de345-root/')

TOKENS_TO_GENERATE = 38

TOP_K = 8

mw = inference.ModelWrapper.load(MODEL_PATH)

txt = "Die Forschung an der künstlichen Intelligenz"

tokens = mw.tokenize(txt)

for i in range(TOKENS_TO_GENERATE):

    # generate TOP_K potential next tokens
    ntk = mw.get_next_top_k(tokens, TOP_K)

    # convert log probs to real probs
    logprobs = np.array(list(map(lambda a: a[0], ntk)))
    probs = np.exp(logprobs) / np.exp(logprobs).sum()

    # pick next token randomly according to probs distribution
    next_token_n = np.random.choice(TOP_K, p=probs)
    next_token = ntk[next_token_n][1]
    # print (next_token)

    tokens.append(next_token)

print(mw.sp_model.DecodePieces(tokens))

The result

Die Forschung an der künstlichen Intelligenz, die sich mit der künstlichen Intelligenz befassen und die Entwicklung der künstlichen Intelligenz vorantreiben will, soll in der Zukunft fortgesetzt werden. Das berichtet Technology Review in seiner aktuellen Ausgabe (online zu bestellen). Das

Great

I'm afraid I'd like a longer text, comparable to python3 src/interactive_conditional_samples.py https://github.com/openai/gpt-2/blob/master/src/interactive_conditional_samples.py

GPT-2 generates there sample texts with 4 paragraphs with 2338 characters 406 words.

What do I have to change in the above script for a longer result text?

I look forward to hints and tips and thank you already now

Ralf

r23 commented 3 years ago

Hello,

you only have to change the value in the line ... for i in range(TOKENS_TO_GENERATE):

:lol:

sorry

lopuhin commented 3 years ago

Nice, glad you figured it out! Fixed formatting in the original post to make the script render.