Closed J-G-Y closed 10 months ago
Hi, after constructing the data, we DID continue with the standard NEXT TOKEN PREDICT Task on the original model.
I'm sorry to disturb you. I'd like to ask you again. Following the approach in your article, the difference is that I will using the data in the supervised fine-tuning period (use Lora or other PEFT methods), will it have a certain effect?(Constrained by hardware limitations, I am unable to perform continue pre-training. I intend to utilize your approach inject the domain data through supervised training. )
Hi, I think PEFT methods should be effective, as our approach can be seen as a form of supervised fine-tuning. Just make sure to compute the NEXT TOKEN PREDICT Task loss on all the tokens.
Hello, I'm a student who has just entered the NLP industry. I'd like to ask you about the technical details of this article's title. After you constructed the data, did you continue with the NEXT TOKEN PREDICT Task on the original model? Or did you use the QA data for supervised fine-tuning to achieve domain adaptation? Looking forward to your response.