kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku
Apache License 2.0
6.26k stars 890 forks source link

Fine-tuning on conversations (format of conversations) #248

Open Eichhof opened 1 year ago

Eichhof commented 1 year ago

Hello

I have a dataset consisting of dialogues between two people which I would like to use for fine-tuning GPT-J. Please see below for two example dialogues. The dialogues vary in length and can be longer than the examples.

Is the format of the conversations ok? For fine-tuning, should I just concatenate all conversations into one big file or do I have to use a separator between the conversations (if yes, which separator)?

First Dialogue:

user1: Hey there. What’s up?

user2: Not much, just hanging out. What about you?

user1: Just thinking about what I’m going to do this weekend. You?

user2: Probably just relaxing. What do you have planned?

user1: I’m thinking about going to the beach. It’s supposed to be nice this weekend.

user2: That sounds like a great plan! Have you been to the beach recently?

user1: Not in a while. It would be nice to get out and enjoy the sun.

user2: Definitely! I’m sure it’ll be a great time. Do you have any other ideas for the weekend?

Second Dialgoue:

user1: Good morning. What is your profession?

user2: Good morning. I’m an accountant. What about you?

user1: I’m a software engineer. How long have you been an accountant?

user2: I’ve been an accountant for about five years now. What about you? How long have you been a software engineer?

user1: I’ve been a software engineer for three years. What do you like most about accounting?

user2: I like how challenging it can be. There’s always something to learn or something new to figure out. What do you like most about software engineering?

user1: I like how creative it can be. I get to come up with new ideas and new ways of solving problems. It’s a great feeling when you can come up with something that works.

mosmos6 commented 1 year ago

@Eichhof I use gpt-j as a chat bot, but I haven't needed to finetune it with dialogue templates. What I could say is

  1. Spaces and new lines can confuse gpt-j. eg,

user1:Hey there. What’s up? user2:Not much, just hanging out. What about you? might be better.

  1. Gpt-j can have difficulties to distinguish pronoun and proper noun eg, it can think "you" is someone's name. so giving some specific names could be better.

  2. Gpt-j has a tendency to try to provoke us (users, humans). It recognizes cliche and try to jump out to utterly unexpected context.

I wish you the best of luck and if possible, I wish you could share a part of your results if it doesn't mess this thread.

krisbianprabowo commented 1 year ago

@Eichhof I use gpt-j as a chat bot, but I haven't needed to finetune it with dialogue templates. What I could say is

I wish you the best of luck and if possible, I wish you could share a part of your results if it doesn't mess this thread.

Hello, I'm still relatively new here in gpt-j. I tried to run the Colab Demo to do some inferences, especially for a chatbot use case. I don't have any idea how to stop the models from generating a new tokens after bot end up answering. In GPT3 we can easily insert a Stop Sequences or the model already good enough to know when to stop. Looks like set the "gen_len" parameter also not works.

Do you have any idea for this?

I included my example prompt and the result below: image