johnsmith0031 / alpaca_lora_4bit

MIT License
533 stars 84 forks source link

flash attention #55

Closed ehartford closed 1 year ago

ehartford commented 1 year ago

Is it possible to use flash attention, in order to fine-tune with longer conversations instead of just a question-and-answer?

ehartford commented 1 year ago

https://github.com/lm-sys/FastChat/search?q=flash

johnsmith0031 commented 1 year ago

Thanks! I'll try it later

dpyneo commented 1 year ago

@johnsmith0031 Thank you. May I ask if it's possible to train a multi round conversation by directly adjusting to the following data format, “# GPT4All-like Data class TrainGPT4All(ATrainData): ... newline_ tokens = self.tokenizer("\n", return_tensors="pt")["input_ids"][0, 1:]”, Is it possible to fine tune a good conversation with a line break? However, the conversation length is indeed relatively short. I briefly tried truncating it to a maximum of 700, which slows down the training speed. By fine-tuning it on the existing Q&A Lora, a simple two round conversation is still possible. Moreover, the greedy algorithm used for reasoning may not seem as good as Q&A, but it may not be well trained, At that time, I tried to see if I could run through multiple rounds of conversations with data such as' {"topic": "Identify the OD one out. '" "input":' The conversation between human and AI assistant. ' n [| Human |] Identify the OD one out. n Twitter, Instagram, Telegram n [| AI |] Telegram n [| Human |]'}/r/n/r/n, 'I don't know if the flash attention can be longer and reach 2048?

ehartford commented 1 year ago

It's possible but without flash attention it will consume too much VRAM.

gururise commented 1 year ago

Anyone get this working? I saw the monkey patch in the README, has anyone tried training multi-round conversation?

ehartford commented 1 year ago

Yamashi got it working

https://github.com/yamashi/alpaca_lora_4bit/

donflopez commented 1 year ago

is the model out there available for using it?

ehartford commented 1 year ago

Well, this is what vicuna does. If you want to see it in action check out vicuna's published weights. (It's quite impressive compared to alpaca)

ehartford commented 1 year ago

https://github.com/johnsmith0031/alpaca_lora_4bit/pull/59