Open Trapper4888 opened 1 month ago
Hello,
A new paper has been published that presents a method for better handling long sequences during fine-tuning: https://wdlctc.github.io/mst.html. The authors have also integrated their code into an unsloth fork: https://github.com/wdlctc/unsloth.
I believe this would be of interest to you, @danielhanchen. If it works as intended, it could be a valuable addition to unsloth.
Otherwise, I will close this issue.
Oh yes saw the PR - thanks for your contribution!
Hello,
A new paper has been published that presents a method for better handling long sequences during fine-tuning: https://wdlctc.github.io/mst.html. The authors have also integrated their code into an unsloth fork: https://github.com/wdlctc/unsloth.
I believe this would be of interest to you, @danielhanchen. If it works as intended, it could be a valuable addition to unsloth.
Otherwise, I will close this issue.