Open swerner opened 4 months ago
few more notes on this for posterity: the tutorial demo of historical event finder has the following issues with the models below:
gpt 4 50/50 gpt 4o 👎
when renaming the function name in the api calls to "formatter" or "format_response" 4o behaves very well. BUT blueprints begins to behave badly.
when renaming the function to something generic like "response" or "function". Blueprints continues to work well. but historical event finder gets even worse.
local: llama3:8b does not play well with event finder either (this was done through xml) llama3.1:8b DOES play well with event finder.
^both work with blueprints
Hmm if you have the llama models set up, want to see how it plays with the list_of_strings
output adapter? I'm curious if the single string output adapter might just be too simple, but if we have a more complex data type it performs better...
sorry took a bit to rebase it had some conflicts
for historical event finder / llama3:8b: {"error"=>"llama3 does not support tools"} bah humbug
for historical event finder / llama3.1:8b: ["First Landing by Vikings in North America", "Independence Day of Chile", "Death of Joseph Stalin"] (just change historical event finder to 3 list long)
I could revert back to the xml approach to test this out if we would like to see how this would effect it!
nah it seems like everyone is converging on this json spec version of tool calling so I think its fine, was more curious to see if that theory of more complex data types could be a fruitful path...
When we have the universal json spec formatter, we could change single string into something like the parameter itself and then an additional ignored parameter like "explanation" or "notes" or something...which may also end up increasing the quality of the output anyway..
Looks like due to the nature of the way we're using function calling to get structured outputs back, some models occasionally return empty strings for the parameters or just return the description/prompt they were given.
A couple initial thoughts: