johnnymcmike / Gravital

A Discord AI Chatbot that uses GPT-2 and aitextgen for fast, believable responses that you can train on your own discord server's message history
MIT License
34 stars 5 forks source link

Bot will occasionally repeat what it saw ~9 messages ago #6

Open johnnymcmike opened 2 years ago

johnnymcmike commented 2 years ago

Marking this as a bug because I'm assuming for now that it's my fault. Every now and then the bot will get stuck in a loop of only repeating what it saw 9 messages up from the one that triggered it (though i've also seen it do this with 8 and 7) verbatim. This happens too often (and 9 is too specific since that's the # of messages i use for context) for me to think it's some quirk of aitextgen or of gpt2. Basically, it repeats one of the messages it's already sent exactly, and sometimes it gets caught in a loop of this until you spam 9 normal-looking messages in a row to "reset" it. Currently have no idea what the cause is.

GitYing commented 2 years ago

I fixed the issue by hours of debugging. Sadly my code is too potato to paste in here but heres what i pretty much did.

inside ai.py

` def get_bot_response(self, message: str) -> str: """ Get a processed response to a given message using GPT model """

below: this is very hacky and bad, but it seems to prevent the input from overflowing the output max_length. please see

    # numtokens = len(message.split()) + 70 + 5*self.maxlines
    old_msgs = message.split('\n')
    oldmsg = []
    for m in old_msgs:
        if not m.endswith(":") and len(m) > 2 and "http" not in m:
            oldmsg.append(m)

    numtokens = len(self.gpt2.tokenizer(message)["input_ids"])
    print("Num Tokens: ")
    print(numtokens)
    if numtokens >= 1000:
        while numtokens >= 1000:
            message = ' '.join(message.split(' ')[20:]) #pretty arbitrary
            numtokens = len(self.gpt2.tokenizer(message)["input_ids"])

    text = self.gpt2.generate(
        max_length=numtokens + 150 + 5*self.maxlines,
        prompt=message + "\n",
        temperature=0.9,
        return_as_list=True,
    )
    res_split = random.choice(text).split('\n')
    ok = []
    for r in res_split:
        if not r.endswith(":") and len(r) > 2 and "http" not in r and r not in oldmsg:
            ok.append(r)
    if len(ok) > 0:
        output = ""
        # print(message)
        return random.choice(ok)`

I deleted a few lines i didn't understand but this pretty much solves the issue above.

What tend to happen was that during generation its not that we were generating old messages its just our results included the "context of the 9 msesages" so what i did was create a filter pretty much in a way to store our old messages again in the same clean format then once again after generation it would run against the old message list and add any new generated responses into a new array. Which from there we pick a random reply.

Apologies if anything is a mess worked on this till 5am so i'll look over the rest once i get some sleep

johnnymcmike commented 2 years ago

Thanks a lot for the interest, I'll test this out when I get home :)

I'm not at my computer right now so I don't have the rest of the code in front of me to fully grasp this, but it looks like you're returning a random line out of a list of "OK" lines (that is, lines that we generated that weren't in the 9-message context, and aren't links). While this would solve the repetition issue, I don't think this is what I want to do. The way the training data is formatted by the datacleaner.py script is such that it's just huge wall of text, with every message being separated by a newline, with no empty lines in the whole thing. This makes it easy for GPT-2 to just guess what it thinks the next line in that series should be, and it also helps with gathering that series (of 9 messages) in a clean way.

So I would rather have it be able to pick the first newly generated line every time instead of picking a random one, because that's made it a lot more conversational and accurate and funny in my experience thus far.

johnnymcmike commented 2 years ago

Also if you didn't understand the "maxlines" bit - that's a command line arg, you can have it send messages that have up to X lines of text in them (but it's set to only 1 by default)

GitYing commented 2 years ago

So I would rather have it be able to pick the first newly generated line every time instead of picking a random one, because that's made it a lot more conversational and accurate and funny in my experience thus far.

You're absolutely right i had assumed it was creating multiple batches of sample text when i did the random in the end haha.. I removed the random and its been working perfectly since.

I did however train my data other 15k times so might be different for some others