Open cksac opened 1 year ago
Is there any plan training with larger context length? Which will make open_llama better than original llama
we really need 8k if not 32k context, is there a way to adjust the architecture to permit for that?
see that mpt-7b has up to 64k context using ALiBi
Is there any plan training with larger context length? Which will make open_llama better than original llama