Separate options for amount of lines 'before' and 'after' the current line in FIM prompts

Is your feature request related to a problem? Please describe. When editing the beginning of a long file, prompt evaluation takes a lot of time. Reason for that - in Additional context

Currently we send similar amount of lines from top and bottom. I believe that we have reasons to make the bottom part smaller:

It takes a long time to reevaluate bottom lines
Bottom lines often aren't as important (IMO). This way we can have more context window left for top lines.

Describe the solution you'd like I want to have separated options for Context Length for 'before' and 'after'.

Describe alternatives you've considered Or maybe leave current Twinny: Context Length as is, but add optional override for bottom lines.

Additional context For context: AFAIK (this is mostly based on my assumptions), llama.cpp doesn't have to reevaluate prefix part of prompt that haven't changed since last generation. But the moment it encounters a change - it will start reevaluating everything after that change. So when we have 2 requests in a row with prompts:

<｜fim▁begin｜>
import numpy
<｜fim▁hole｜>
print('Hello World!')<｜fim▁end｜>

<｜fim▁begin｜>
import numpy
import<｜fim▁hole｜>
print('Hello World!')<｜fim▁end｜>

It won't have to spend time on evaluating import numpy. However, it will still have to run everything after <｜fim▁hole｜> (because it only checks for prefix in prompt). (Example of llama.cpp output (not for this exact case): Llama.generate: 2978 prefix-match hit, remaining 8 prompt tokens to eval)

twinnydotdev / twinny

Separate options for amount of lines 'before' and 'after' the current line in FIM prompts #298