Picking different model between tab complete and chat incurs loading model to memory penalty lag

redhat-developer / vscode-paver

Use IBM Granite LLM as your Code Assistant in Visual Studio Code

Apache License 2.0

5 stars 12 forks source link

Picking different model between tab complete and chat incurs loading model to memory penalty lag #137

Open jamescho72 opened 1 week ago

jamescho72 commented 1 week ago

ollama does not load more than 1 model at a time. Configuring 2 different models under config.json in continue causes ~45-60 second lag between calls to tab complete vs chat for a poor experience. tab complete appears broken each time it need to re-load into memory.

We may want to enforce single model for both chat and tab code complete until this can be resolved.

fbricon commented 5 days ago

The 1st immediate steps would be to:

remove the tab completion combo (comment the code)
change the "Chat Model" label to "Chat/Tab Completion Model"
on the backend side copy the chatModel value into the tabCompletionModel value

That should work.

Then in an another steps/PR:

change the UI to add a "Advanced" button
on click, it would reveal the tab completion combo, rename the button to "Simple"
toggle "Chat/Tab Completion Model" label to "Chat Model"
on the backend side, only copy the chatModel value into the tabCompletionModel value if UI mode is advanced

fbricon commented 5 days ago

@SachinS10-lab part 1 is done in #139, can you please add the advanced section, where the tab completion can be selected?

SachinS10-lab commented 1 day ago

Added toggle button for advanced section.

For Simple Configuration

For Advanced Configuration

I also add an information message below the box for better user experience. Is this Okay?

fbricon commented 1 day ago

For the advanced section, you forgot

on click, it would reveal the tab completion combo,

Please open a PR, it'll be easier to review than commenting on screenshots