microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.42k stars 255 forks source link

Paper:ADAPTING LARGE LANGUAGE MODELS VIA READING COMPREHENSION #149

Closed J-G-Y closed 6 months ago

J-G-Y commented 6 months ago

Hello, I'm a student who has just entered the NLP industry. I'd like to ask you about the technical details of this article's title. After you constructed the data, did you continue with the NEXT TOKEN PREDICT Task on the original model? Or did you use the QA data for supervised fine-tuning to achieve domain adaptation? Looking forward to your response.

cdxeve commented 6 months ago

Hi, after constructing the data, we DID continue with the standard NEXT TOKEN PREDICT Task on the original model.

J-G-Y commented 6 months ago

I'm sorry to disturb you. I'd like to ask you again. Following the approach in your article, the difference is that I will using the data in the supervised fine-tuning period (use Lora or other PEFT methods), will it have a certain effect?(Constrained by hardware limitations, I am unable to perform continue pre-training. I intend to utilize your approach inject the domain data through supervised training. )

cdxeve commented 6 months ago

Hi, I think PEFT methods should be effective, as our approach can be seen as a form of supervised fine-tuning. Just make sure to compute the NEXT TOKEN PREDICT Task loss on all the tokens.