Open imoneoi opened 1 year ago
A simple finetune (lora is enough) for a stretched rope would be enough. see eg https://github.com/ggerganov/llama.cpp/discussions/1965
@Green-Sky We observed that fine-tuning may still cause performance degradation. It is better to have a native 8192 pretrained model.
sounds like you are not using rope scaling. some rope scaling variants can get away without finetuning.
You can try LongLLaMA which is a long-context (8192 and beyond) finetune of OpenLLaMA: https://github.com/CStanKonrad/long_llama https://huggingface.co/syzymon/long_llama_3b
It uses a different method than PI (see https://arxiv.org/abs//2307.03170 for details). There is no degradation on short context compared to the original 3B checkpoint and we are working to release larger models soon.
StarCoderPlus uses StarCoder + RefinedWeb dataset for training but with a longer context length. Are there plans to release a version with a longer context length, such as 8192?