Fine-tuning Falcon - Githubissues

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Apache License 2.0

36.46k stars 4.49k forks source link

Fine-tuning Falcon #1588

Open ehartford opened 1 year ago

ehartford commented 1 year ago

When I try to train Vicuna against Falcon it fails,

The big thing is the padding, since Falcon has no padding. (that is the part I can't figure out myself)

Also Falcon doesn't have flash attention, so there needs to be a new monkey script to swap flash attention for Falcon. (I could probably figure this part out, if the padding part was working)

ericzhou571 commented 1 year ago

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

jercas commented 1 year ago

Hi gays. I notice you're talking about padding. I ran into a problem while doing vicuna batch infer. Tensor padding is required when batch infer is performed. But no matter what token I use to pad, the generating effect will be very bad. I had tried bos_token, "", "[pad]" as a pad_token.

ehartford commented 1 year ago

You can utilize the Falcon special token, denoted as >>SUFFIX<< (token ID: 9), as a padding token, which has proven to be effective in my experience.

Furthermore, I am eagerly awaiting the addition of a Flash Attention Monkey script by someone.

If you could please provide a gist or at least a pointer to where in the code I would make this change (the padding change not the flash attention monkey patch), I would greatly appreciate it.

I have tried, and failed, (twice, and I tried hard, and I am good at this kind of thing) to modify fastchat trainer to work with falcon's padding (or rather lack thereof), and I would need further guidance to move forward. I greatly desire to use FastChat rather than other tuning solutions, because the quality I get is very high from FastChat.

AWESOME new API by the way!! Much needed, and much appreciated.

ericzhou571 commented 1 year ago

I have already written a Falcon version that includes training, inference, and conversation capabilities. Additionally, I have trained a Falcon 7B model compatible with Vicuna13B (not fully tested) using the Wizard ShareGPT dataset with my code for Falcon. I will be making a pull request later tonight.