redhat-developer / vscode-paver

Use IBM Granite LLM as your Code Assistant in Visual Studio Code
Apache License 2.0
5 stars 12 forks source link

Picking different model between tab complete and chat incurs loading model to memory penalty lag #137

Open jamescho72 opened 1 week ago

jamescho72 commented 1 week ago

ollama does not load more than 1 model at a time. Configuring 2 different models under config.json in continue causes ~45-60 second lag between calls to tab complete vs chat for a poor experience. tab complete appears broken each time it need to re-load into memory.

We may want to enforce single model for both chat and tab code complete until this can be resolved.

fbricon commented 5 days ago

The 1st immediate steps would be to:

That should work.

Then in an another steps/PR:

fbricon commented 5 days ago

@SachinS10-lab part 1 is done in #139, can you please add the advanced section, where the tab completion can be selected?

SachinS10-lab commented 1 day ago

Added toggle button for advanced section.

Image


For Simple Configuration

Image


For Advanced Configuration

Image


I also add an information message below the box for better user experience. Is this Okay?

Image

fbricon commented 1 day ago

For the advanced section, you forgot

on click, it would reveal the tab completion combo,

Please open a PR, it'll be easier to review than commenting on screenshots