guinmoon / LLMFarm

llama and other large language models on iOS and MacOS offline using GGML library.
https://llmfarm.site
MIT License
1.05k stars 62 forks source link

Please add function calling! #72

Open MiniPhantom opened 2 weeks ago

MiniPhantom commented 2 weeks ago

As the title states, please add function calling for inference! Maybe this could work by having a folder of “.function” files (which are just normal text files) that contain javascript or something along those lines. The text generation required to trigger the function could just be the name of the file (e.g. “searchgoogle(input1, input2)”). Then just wait for a response before continuing generation? I don’t know really how the app works in the end though so it may simply just not be possible.

Also a few bugs:

If I add a clip model to a chat’s settings, it is unable to be removed from inside the settings instead I have to go into the file and delete the clip model value. Maybe the toggle for clip can automatically remove the value from the json file when toggled off?

The first message sent by the user after context is loaded back sometimes often results in the response to the first message not displaying the text generated but rather only the “loading circle” endlessly. The text actually still finishes generating though as when clicking on another chat and back it fixes the circle issue and shows the model’s response as usual again. Maybe could be fixed by checking constantly if the loading circle should be there every second as the response is being generated until either the response is finished or the loading circle has been removed.

guinmoon commented 2 weeks ago

Thank you for your feedback. I will try to fix the bugs in new versions of the application. LLMFarm currently supports Shortcuts, you can use their features instead of calling JS functions. Believe me there is much more potential there.

MiniPhantom commented 2 weeks ago

Also what is Flash Attention in the json files?

MiniPhantom commented 2 weeks ago

And by calling functions I also mean it could retrieve information mid-generation? Such as if it were solving (19187 x 18210)2 it could call the calculator for the result of that and then calculate the square. I believe GPT4 coder can do this?