Code snippets in the chat window loose syntax highlighting occasionally

nicikiefer commented 2 months ago

Describe the bug The code snippets shown in the chat window loose syntax highlighting occasionally and appear in plain white color. Both the input as well as the output code is affected by this. Sometimes both, the highlighted code as well as the generated code appear without syntax highlighting. One thing I could observe is that sometimes the syntax highlighting is correctly applied until the very end of the code generation. Once code generation finishes, the syntax highlighting is lost again.

I am not sure what is causing this. I don't use any special themes and since the code snippets are correctly formatted as such, I assume there is just the color coding of the syntax highlighting missing in the last step.

To Reproduce

Use Ollama as your provider
Use llama3 as your chat model
Mark code in your editor and have it fixed, explained, refactored by twinny
Either the selected code or the generated code is missing syntax highlighting and appears in plain white (see screenshot below)

Expected behavior Syntax highlighting being correctly applied to both the code used as an input as well as the generated code.

Screenshots Bildschirmfoto vom 2024-05-10 16-00-01

grafik

Edit: the first screenshot shows a different model than llama3 because I moved over to codeqwen because I encountered the syntax highlighting issue. But to be clear, I did use llama3 when encountering the issue.

Logging Rnable logging in the extension settings if not already enabled (you may need to restart vscode if you don't see logs). Proivide the log with the report.

API Provider Ollama

Chat or Auto Complete? chat

Model Name llama3:latest

Desktop (please complete the following information):

OS: Ubuntu 24.04
VSCode (see metadata attached below)
Twinny Version: v3.11.35

VSCode:

Version: 1.89.1
Commit: dc96b837cf6bb4af9cd736aa3af08cf8279f7685
Date: 2024-05-07T05:16:23.416Z
Electron: 28.2.8
ElectronBuildId: 27744544
Chromium: 120.0.6099.291
Node.js: 18.18.2
V8: 12.0.267.19-electron.0
OS: Linux x64 6.8.0-31-generic snap

Additional context It does happen for other models as well (like codeqwen), but it most often appears with llama3:latest (pulled directly using Ollama).

Please let me know if you need more input or if there are other ways to successfully use llama3 and I am just doing it wrong. Also thanks for this amazing extension :heart:

rjmacarthy commented 1 month ago

Hello, thanks for the report. Basically when an LLM replies with the code/text it needs to provide backticks "```" to indicate that the code is a codeblock. By editing/improving your prompt you might get better results, if the model decides not to add backticks I don't know it's code and I am too lazy to implement some other code detection method.

nicikiefer commented 1 month ago

Fair enough. Since in this special case I tried to fix code using twinny, do you think it is safe to assume to use syntax highlighting even if the model does not provide it itself? If I understand it correctly the missing syntax highlighting in the screenshot does not concern the model but the pasted code. But that being said, I am not sure if the code I highlighted will just be pasted into the Fix code editor or if a model is already doing some preprocessing there.

Just trying go for the low hanging fruit in case such a simple heuristic might improve this issue already, but I also get that it is not high prio and maybe not worth adding workarounds for misbehaving models or wrongly used models.

Hope that helps, lmk if you need anything else and no hard feelings if you focus on something else instead

rjmacarthy commented 1 month ago

I see, sorry I didn't realise you were referring to pasted code also. Somehow you'd have to provide the language to the markdown react syntax highlighter like ```python, I could read the current active editor, but if the user changes it then pastes the code it might be wrong. Edit: there is no pre-processing but it might be be possible to classify the code type and apply the correct annotation.

nicikiefer commented 1 month ago

I see, thanks for your explanation! All in all it sounds like it might be too much work for a corner case like this. I guess the only way you could detect it is by the active editor as you pointed out and if possible get the file type and create the markdown syntax highlighter based on this. But as you also pointed out, in case it is possible for a user to switch the editor before the code was pasted, that also is not a consistent behavior.

rjmacarthy commented 1 month ago

It might also be possible to get the language at point of copy not just paste. However that doesn't cover the fact that code can be copied outside of the editor too. Code classification would be the best way, might be a waste of tokens for someone using external api like open AI for chat, for local instance it would be fine, not sure if would also need a new model type or or could classify with the instruction model.

nicikiefer commented 1 month ago

Code classification would be the best way

I agree. Not sure but could there be a way that VSCode can handle that for you? I assume they themselves might have some classification to autodetect which syntax highlighting to use.

Anyways, thanks for bouncing back some ideas! I guess copilot gets away with it by just mentioning the context that it takes into account and afaik does only allow code in the context of VSCode

twinnydotdev / twinny

Code snippets in the chat window loose syntax highlighting occasionally #244