Open jukofyork opened 11 months ago
Hi @jukofyork . Thank you for your input. I've merged your changes.
Hi @gradusnikov ,
No problem and glad to be of help!
I've got the Ollama port running really well now. I've still got to tidy it up and have stripped out a lot of stuff that didn't really work well yet with locally run LLMs: function calling via prompts barely worked and had to have streaming turned off, the local LLMs couldn't really create a working diff file, a lot of the stuff specific to JavaDoc and the Eclipse Java AST tree, etc.
One thing you might want to add to your code is to make the right click menu context sensitive:
/**
* This class handles the contributions to the Eclipse menu based on the active editor.
*/
package eclipse.plugin.assistai.handlers;
import org.eclipse.e4.core.di.annotations.Evaluate;
import org.eclipse.e4.ui.workbench.modeling.EModelService;
import org.eclipse.jface.text.ITextSelection;
import org.eclipse.ui.PlatformUI;
import org.eclipse.ui.texteditor.ITextEditor;
import eclipse.plugin.assistai.views.ChatConversationPresenter;
public class MenuContributionsHandler {
@Evaluate
public boolean evaluate(EModelService modelService) {
// Get the active editor
var activePage = PlatformUI.getWorkbench().getActiveWorkbenchWindow().getActivePage();
var activeEditor = activePage.getActiveEditor();
// Check if it is a text editor
if (activeEditor instanceof ITextEditor) {
return true;
}
// Hide the context menu then.
return false;
}
}
Then add this to 'fragment.e4xmi':
`
Sounds, great, looking forward to try this!
Hi @jukofyork
I find function calling very useful, esp. after adding web search and and web read. I think I will add more, as this is a simple and quite powerful way to make the LLM answer more accurately. I have not tried function calling with other LLMs but maybe the approach from like 6 months ago would work, where people were defining function definitions as part of the system message, along with the function call format? Or I can simply disable function calling in Settings?
Hi again,
I've got the communication with the Ollama server working fairly robustly at last: their server is a Go wrapper around llama.cpp
and it's very prone to crashing from OOM errors, but exponential backoff seems to give it time to restart itself and all the random http disconnects aren't a problem now.
I've struggled a lot with the dependantcy injection: either things not getting injected causing baffling null pointer exceptions or other weird things like the ILog (which I've now added a listener to to display in blue chat bubbles) seeming to have multiple copies instead of being a singleton, etc. I'm still not 100% sure why, but I think it's Eclipses' own dependantcy injection somehow interfering. Anyway had to strip a lot of it away to make sure everything works.
I've iterated over a few different methods of using the interface and finally setted on the right click context menu and a toggle button to decide if the full file should be sent as extra context or not. This along with grabbing and appending anything in the edit box to the end of the prompt message seems to be the most usable. I think the Continue
input method (https://continue.dev/) with their slash commands might be worth looking at too, but this is working so well now I don't really have the motivation to try it.
I did consider seeing if I could get a "tree" of responses like a lot of the LLM web apps implement (with undo, sideways edits, etc) and possibly even see if I can journal the stuff getting sent to the Browser widget to serialise and restore, but I don't think it will really be that useful as long conversations soon exhaust the context windows of all the available locally runnable LLMs...
I've added lots of other little fixes like rate-limiting and buffering the streaming to 5 events per second as found it could start to lag the Eclipse main UI thread out badly for some of the smaller/faster models that can send 20-50 events per second.
Anyway, it's mainly just a case of tidying up the View code and I will share it back via Github and hopefully some of the stuff will be useful.
Hi @jukofyork
I find function calling very useful, esp. after adding web search and and web read. I think I will add more, as this is a simple and quite powerful way to make the LLM answer more accurately. I have not tried function calling with other LLMs but maybe the approach from like 6 months ago would work, where people were defining function definitions as part of the system message, along with the function call format? Or I can simply disable function calling in Settings?
Yeah, there is quite an interesting discussion on this here:
https://github.com/jmorganca/ollama/issues/1729
They are defining the functions in the system message and then doing 4-5 shot teaching by making the first few messages examples of calling the functions.
@gradusnikov I have added lots of things you might find useful, eg:
The final thing I want to do is allow multiple copies of the view to be opened and then I'll upload to Github later this week.
I'm happy for others to try it out and use it, but Ollama is very buggy and I don't want to be spending lots of time helping people get Ollama working or step on @gradusnikov toes since it's his project after all and I've stripped out as much as I have added... I'll try and create a plug in installer and add some instructions too, but it's more going to be left as a foundation for others to build on rather than an active fork I want to maintain.
@gradusnikov
I've done my best to commit the code to github (no idea why it's ended up in a subfolder like that though :confused:):
https://github.com/jukofyork/aiassistant
The bits that are probably most useful to you:
ReviewChangesBrowserFunction
: Implements a CompareEditor
which was particularly hard due to the Eclipse docs being out of date (see the URLs for links that explain the new/correct way to implement it). Make sure to set "Ignore white space" in Compare/Patch settings in Eclipse or else CompareEditors don't work well in general..MenuVisibilityHandler
: Is used to stop the right click context menu appearing all over eclipse (there is also a similar visibility handler for the 'Fix Errors' and 'Fix Warnings' options).URLFieldEditor
: Strictly enforces to allow valid URLs with port numbers only (I had Eclipse break really badly when I input a bad URL and had to manually find and edit the preference store to fix it!).IndentationFormatter
: Is useful to remove and reapply indentation (mainly for the different BrowserFunction classes).LanguageFileExtensions
: Loads a JSON file with all the {language, extensions} tuples used by highlight.js
.BrowserScriptGenerator
: Has code in it to do things like: undo, scroll to top/bottom, scroll to previous/next message, detect if the scrollbar at the bottom, etc.There are also lots of small changes to do with the main view you may or may not want to use:
The prompts are the best I can come up with after a couple of months of trying. In general I've found the less newlines the better and starting your tasks with a '#' symbol seems to help them (possibly they think it's a markup header or maybe even they have been overtrained on Python comments). I've made it so the prompts use the StringTemplate library now:
https://github.com/antlr/stringtemplate4/blob/master/doc/cheatsheet.md
and added several other possibly useful context variables and a special <<switch-roles>>
tag that can be used for delaying responses, forcing responses, multi-shot learning, etc (have a look in the prompts for examples of its use).
I've had to strip out all of the @inject
stuff. I tried different versions of Javax/Jakarta and just got really weird bugs where nearly identical instances would fail to inject and give null pointer exceptions... Then one day Eclipse did some kind of update and absolutely nothing worked (even after a full reinstall/revert!), so I just had to create a new project and move each class back in one at a time.
I also found the fragment.e4xmi
stuff to be very buggy (eg: the main view would be blank if you closed and reopened it, etc) and converted the main view and context menu to use the `plugin.xml' extensions instead.
I think from reading the issues here and on the Eclipse marketplace the Javax/Jakarta stuff and fragment.e4xmi
view are the main cause of the problems being reported.
I also had to move all the dependencies into the main plug-in as for some reason this caused me a lot of problems too (possibly because I was trying to edit the forked version though).
I did have more advanced code for the networking (due to Ollama being so buggy and crashing often from OOM errors), but had to remove it as found due to the way Eclipse only uses a single GUI thread; it caused more problems than it solved (ie: the menus kept freezing, etc).
One thing I didn't fix but probably needs looking at is the O(n^2) complexity of the way the streamed tokens get added to the browser window: it gets more and more slow and starts to cause the main Eclipse GUI thread to stall. The best solution I could find without completely rewriting the code for this is to use estimateMaximumLag
and concatenate to an internal buffer (see: 'OllamaChatCompletionClient.processStreamingResponse`). This is still O(n^2) but it does stop the whole Eclipse GUI from stalling as more and more gets added to the Subscription class's buffer.
There are probably a lot of other changes that I've forgotten to mention here, but would just like to say thanks for creating the base project!
Just noticed there is some random 'ToDo' list with prompts come up as the main readme - I'll see if I can tidy it up tomorrow (I don't really use Git and seem to always make a mess of it :frowning_face:).
I've also deliberately not added any actual binary release for the plugin as: firstly I don't want to take away from this project, and secondly I don't want become an Ollama technical support person...
If anybody wants to use it: then you just need to install the plugin development stuff in Eclipse, use 'Import Fragments' and 'Export plugin' and it should work.
Hi jukofyork!
Thank you very much for your edits. I will try to integrate your changes with the main branch.
Cheers!
/w.
On Sun, 7 Apr 2024 at 22:49, jukofyork @.***> wrote:
Just noticed there is some random 'ToDo' list with prompts come up as the main readme - I'll see if I can tidy it up tomorrow (I don't really use Git and seem to always make a mess of it ☹️).
I've also deliberately not added any actual binary plugin as: firstly I don't want to take away from your project, and secondly I don't want become an Ollama technical support person...
If anybody wants to use it: then they just need to install the plugin development stuff in Eclipse, use 'Import Fragments' and 'Export plugin' and it should work.
— Reply to this email directly, view it on GitHub https://github.com/gradusnikov/eclipse-chatgpt-plugin/issues/16#issuecomment-2041605086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAECES64HJN63DSS6SMFIL3Y4GWNVAVCNFSM6AAAAABBES3NXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGYYDKMBYGY . You are receiving this because you were mentioned.Message ID: @.***>
Hi jukofyork! Thank you very much for your edits. I will try to integrate your changes with the main branch. Cheers! /w.
No problem and I hope it is helpful :)
I've updated the README to hopefully better explain how to build/use the forked version, and have added a few pictures and notes that I might have forgot to mention above.
I have some other work I need to do for the next few weeks, but the next things I want to look at are:
I'll be sure to share back anything I find and will have a look through your latest code when I get back to it - the fork is based on a pull I made sometime last December and I see you have made quite a lot of changes since then.
There are also quite a few changes to the Ollama API underway: OpenAI compatibility, function calling, etc are on their ToDo list, so it's probably a good time to leave it and see what they do next too.
I'm finding Ollama to be too buggy to use now - it seems each bug they fix they create 2 more and their Golang wrapper of llama.cpp's server is getting more and more impenetrable to fix anything... It's some strange mix of llama.cpp's server.cpp
code from a few months ago, which imports and uses much newer llama.h
and llama.cpp
files... I can only see things getting worse in the future :frowning_face:
So it looks like I'm going to have to start using the llama.cpp server directly, but aren't sure if I should leave or just remove the Ollama server code now:
<<switch-roles>>
stuff doesn't work any more and only the last "user" message sent gets seen.chat/completions
endpoint now has the exact opposite bug with the system message (ie: it used to get ignored if you didn't have a default one defined in your modelfile, but now does the opposite).{{.First}}
variable for the go/text/template
engine.and so on... It's really so buggy now I don't actually trust what is getting sent to the server is actually what you expect (the system message getting ignored bug went unnoticed for months!).
The problem is that if I leave the Ollama code in then options like "Add Context" won't actually work (nor will any future multi-shot prompts), but at the same time I'm reluctant to remove it as sometime in the future they may actually start to fix some of these bugs. Things like being able to list available models, load new models, allow the GPU VRAM to be unloaded after 5 minutes if you don't send keep-alive messages, and so on were all much nicer than what is going to be possible with the stock llama.cpp server :confused:
@gradusnikov
On another note I have been researching how the completion engine works in Eclipse:
and specifically the CDT code that is used for C++ completion:
It looks horrifically complex, but like anything in Eclipse it is probably not that bad if you can get a minimal working example running... I doubt I'll have time to look at this properly for a few weeks though, but I think it would be worth looking into.
@gradusnikov Not sure if you are still working on this, but I got Eclipse's internal spell-checker working now:
It should be pretty much a drop-in replacement for the Text
widget so very easy to add if desired. It also has the hover "quick fix" menu working, but it isn't 100% like the Eclipse Editor menu as that seems to use xtext
extension instead of just text.ui
, but it still works and is quite useful for adding words to the dictionary or changing to suggestions, etc.
I think spelling mistakes likely harm LLMs quite significantly due to having to tokenise in strange / out-of-distribution ways, so it's probably a good way to boost quality for free.
With some more reading it should be possible to make it ignore text inside code blocks too, but haven't had chance to look yet.
This might also be useful for your temperature field:
http://www.java2s.com/example/java-src/pkg/org/eclipse/wb/swt/doublefieldeditor-1e135.html
Not having a real-valued scaler type was a serious oversight in SWT I think.
There is also this that I bookmarked:
http://www.java2s.com/Code/Java/SWT-JFace-Eclipse/SWTCompletionEditor.htm
It's very out of date from a 2004 book, but likely a good starting off point to implement auto-complete using "fill in middle" LLMs. If I get chance I will look into this and report back.
EDIT: Here is the book the source came from too: https://livebook.manning.com/book/swt-jface-in-action/chapter-5/88
@gradusnikov
Not sure if you are still working on this, but I've got LaTeX rendering working via MathJax v3.2.2:
I had endless problems with this when I tried in the past, due to the formatting getting all mangled... But found a trick to avoid it via encoding as Base64 in Java:
private static String convertInLineLatexToHtml(String line) {
String inlineLatexPatterns =
"\\$(.*?)\\$|" + // Single $ pairs
"\\\\\\((.*?)\\\\\\)"; // \( \) pairs
Pattern inlineLatexPattern = Pattern.compile(inlineLatexPatterns);
return inlineLatexPattern.matcher(line).replaceAll(match -> {
// Check each capture group since we don't know which pattern matched
for (int i = 1; i <= match.groupCount(); i++) {
String content = match.group(i);
if (content != null) {
String base64Content = Base64.getEncoder().encodeToString(content.getBytes());
return "<span class=\"inline-latex\">" + base64Content + "</span>";
}
}
return match.group(); // fallback, shouldn't happen
});
}
and for blocks:
private static void flushLatexBlockBuffer(StringBuilder latexBlockBuffer, StringBuilder htmlOutput) {
if (latexBlockBuffer.length() > 0) {
htmlOutput.append("<span class=\"block-latex\">");
htmlOutput.append(Base64.getEncoder().encodeToString(latexBlockBuffer.toString().getBytes()));
htmlOutput.append("</span>\n");
latexBlockBuffer.setLength(0); // Clear the buffer after processing to avoid duplicate content.
}
}
Then decoding it from the HTML tags in JS:
function renderLatex() {
// Convert block latex tags
document.querySelectorAll('.block-latex').forEach(elem => {
let decodedLatex = atob(elem.innerHTML);
elem.outerHTML = '\\\[' + decodedLatex + '\\\]';
});
// Convert inline latex tags
document.querySelectorAll('.inline-latex').forEach(elem => {
let decodedLatex = atob(elem.innerHTML);
elem.outerHTML = '\\\(' + decodedLatex + '\\\)';
});
MathJax.typeset();
}
I only tried this again after finding that o1-mini
and o1-preview
have no system message ability, and they just love to write huge walls of LaTeX that was nearly impossible to read without rendering it... :frowning:
It should work on all valid LaTeX; block and inline, using both the dollar and bracket delineators. The only thing it won't do is allow multiline $
or \(
versions (in the same way as the inline single backtick for code won't), as there is probably too much chance to start scrambling other text due to these being common in written word and/or code...
It also needs to be rendered only once (hence the flushLatexBlockBuffer()
function), as the trick with highlight.js
of "auto-closing" the Markdown code block whilst it's streaming the response (really badly) messes up the expressions and/or causes warnings to flash up mid-expression.
Not an issue but I can't see any discussion board for this project...
I've now got this working directly with the Ollama chat completion API endpoint so it's now possible to use it with Local LLM instances:
https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-chat-completion
I originally tried to use LiteLLM to emulate the OpenAI API and then have it communicate with Ollama but it didn't really work.
So instead I've just made a hacked version of the plug-in and got it to communicate directly, and after finally getting to the bottom of why it was hanging after a couple of pages of text (see my other issue post for solution) it seems to be working pretty well. The main changes needed were:
AFAIK none of the open source LLMs can handle the function format OpenAI's models use so that isn't active yet, but I'm pretty sure I can get it to work using prompts at least to some extent. LiteLLM seems to have the ability to do this using the "--add_function_to_prompt" command line option:
https://litellm.vercel.app/docs/proxy/cli
I can probably tidy up the code in a couple of days if anyone is interested?