WIP: Auto mode rework - Githubissues

nitanmarcel commented 2 months ago

Checklist

~~- [x] Rewrite most of the code~~ ~~- [X] Re-Implement OpenAI~~ ~~- [ ] Re-implement Anthropic~~ ~~- [ ] Re-Implement Llama~~ ~~- [ ] Re-Implement Bedrock~~ ~~- [ ] Re-Implement Groq~~ ~~- [ ] Re-Implement Google~~ ~~- [ ] Re-Implement NousResearch~~ ~~- [ ] Implement chromadb (Add radare documentation as knowledge base, promt the ai to return "memories" that can be saved and used later.~~

[X] Implement litellm
[X] Implement llama_cpp
[X] Drop chroma db which is 200 MB in size with sqlite which 17KB in size
[ ] Chunk Messages to avoid reaching token limit
[x] https://github.com/radareorg/r2ai/issues/41
[ ] Optimizations
[ ] Add documentation on adding custom functions
[ ] More functions? Like internet + now we could support external functions out of the box (of course,we still have to implement the api.
[ ] And maybe more?

nitanmarcel commented 2 months ago

@dnakov :)

trufae commented 2 months ago

Omg this pr couldnt be bigger

nitanmarcel commented 2 months ago

Omg this pr couldnt be bigger

hihi. I'm not done

nitanmarcel commented 2 months ago

Can someone tell me what's a token limit? Because I don't get any =)))

Screenshot from 2024-09-13 00-06-18

nitanmarcel commented 2 months ago

Oh, this one -_-. I was close enough tho


openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, you requested 8226 tokens (3983 in the messages, 147 in the functions, and 4096 in the
completion). Please reduce the length of the messages, functions, or completion.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}```

dnakov commented 2 months ago

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch

All of these can just be served via a single litellm call like in ui/chat.py

nitanmarcel commented 2 months ago

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch

All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

nitanmarcel commented 2 months ago

Would be something like:


@process_response(processor=function_to_convert_response)
def unsupported_model_call(...):```

nitanmarcel commented 2 months ago

Would be something like:

@process_response(processor=function_to_convert_response)
def unsupported_model_call(...):```

Tho it implies that it supports the same tool format. Can create a pre_processor argument for that too

nitanmarcel commented 2 months ago

Can someone tell me what's a token limit? Because I don't get any =)))

Anyway, I have this to figure out. The chunking of big resilts works pretty fine almost

dnakov commented 2 months ago

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.

nitanmarcel commented 2 months ago

@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch All of these can just be served via a single litellm call like in ui/chat.py

I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response

@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.

Took a deeper look at what instructor (a new library we use here) does under the hood to support other llm's tools and it has everything I need to easily implement the tools. Parsing the raw response needs to be done from scratch but I have the old code for that

https://github.com/jxnl/instructor/blob/959097e174a4cd57101503b433b0af8bcb39726d/instructor/function_calls.py

dnakov commented 2 months ago

Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.

nitanmarcel commented 2 months ago

Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.

From what I've seen so far you still have to raw parse things, but there's the same response for all llms

dnakov commented 2 months ago

Yes, so you're only parsing 1 thing vs 1 for each API/model

nitanmarcel commented 2 months ago

Yes, so you're only parsing 1 thing vs 1 for each API/model

We'll have to wait for an answer from @trufae.

nitanmarcel commented 2 months ago

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

dnakov commented 2 months ago

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

nitanmarcel commented 2 months ago

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

I can use llmlite :)

nitanmarcel commented 2 months ago

Yes, so you're only parsing 1 thing vs 1 for each API/model

Got the green. The other green not the high green

What does that mean?

doesn''t work

AttributeError: 'Delta' object has no attribute 'role'

nitanmarcel commented 2 months ago

ah the version was too old, but still:



ModelResponse(
│   id='chatcmpl-A730rEKe2r4dS5mn2BcCWYT9ILvQw',
│   choices=[
│   │   StreamingChoices(
│   │   │   finish_reason=None,
│   │   │   index=0,
│   │   │   delta=Delta(
│   │   │   │   refusal=None,
│   │   │   │   content=None,
│   │   │   │   role='assistant',
│   │   │   │   function_call=None,
│   │   │   │   tool_calls=[
│   │   │   │   │   ChatCompletionDeltaToolCall(
│   │   │   │   │   │   id=None,
│   │   │   │   │   │   function=Function(arguments='{\n', name=None),
│   │   │   │   │   │   type='function',
│   │   │   │   │   │   index=0
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   logprobs=None
│   │   )
│   ],
│   created=1726243242,
│   model='gpt-4',
│   object='chat.completion.chunk',
│   system_fingerprint=None
)```

nitanmarcel commented 2 months ago

I do parse the funtions myself now so maybe this is my issue

nitanmarcel commented 2 months ago

I do parse the funtions myself now so maybe this is my issue

Yep, I forgot how generators work 😅

trufae commented 2 months ago

my comments on litellm:

it feels so comercial by checking the website
im afraid of the amount of deps to make it too huge when packaged
we lose control on how we build the prompts
doesnt support llama afaik

so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

dnakov commented 2 months ago

It is commercial, I think they got some VC funding, although it's at least MIT licensed. It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc It doesn't actually do anything to the prompts, you still have the same amount of control. Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

I'm thinking we can just use their convention <provider>/model_name for any API models so this way we don't have to constantly update model names. So -m openai/o1-preview would work even if it's not on the list.

nitanmarcel commented 2 months ago

my comments on litellm:

it feels so comercial by checking the website

im afraid of the amount of deps to make it too huge when packaged

we lose control on how we build the prompts

doesnt support llama afaik

so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

IAs long as we don't use their UI or their moderation tools we are covered by the MIT license.

2, doesn't happen since we use the conversation wrapper, it returns the same format as OpenAI for all endpoints.

And we can use llama separately. About the size, it uses extras so in our case it only downloads the deps we need

Downloading litellm-1.45.0-py3-none-any.whl.metadata (32 kB)
Collecting aiohttp (from litellm)
  Downloading aiohttp-3.10.5.tar.gz (7.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.5/7.5 MB 4.3 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting click (from litellm)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting importlib-metadata>=6.8.0 (from litellm)
  Downloading importlib_metadata-8.5.0-py3-none-any.whl.metadata (4.8 kB)
Collecting jinja2<4.0.0,>=3.1.2 (from litellm)
  Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting jsonschema<5.0.0,>=4.22.0 (from litellm)
  Downloading jsonschema-4.23.0-py3-none-any.whl.metadata (7.9 kB)
Collecting openai>=1.45.0 (from litellm)
  Downloading openai-1.45.0-py3-none-any.whl.metadata (22 kB)
Collecting pydantic<3.0.0,>=2.0.0 (from litellm)
  Downloading pydantic-2.9.1-py3-none-any.whl.metadata (146 kB)
Collecting python-dotenv>=0.2.0 (from litellm)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting requests<3.0.0,>=2.31.0 (from litellm)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tiktoken>=0.7.0 (from litellm)
  Downloading tiktoken-0.7.0.tar.gz (33 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting tokenizers (from litellm)
  Downloading tokenizers-0.20.0.tar.gz (337 kB)

nitanmarcel commented 2 months ago

On the other side

It is commercial, I think they got some VC funding, although it's at least MIT licensed. It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc It doesn't actually do anything to the prompts, you still have the same amount of control. Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.

How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?

I'm thinking we can just use their convention <provider>/model_name for any API models so this way we don't have to constantly update model names. So -m openai/o1-preview would work even if it's not on the list.

I don't think we even need to do constantly update our models, at least in auto only the provider is set. About the /model it can be done too without LLM by creating our own wrapper around the endpoints. Maybe who knows, I come over an idea to make it easier to maintain

nitanmarcel commented 2 months ago

So, litellm or not, these can be done manually. + We can freely use parts of litellm in our code due to the dual license they use

nitanmarcel commented 2 months ago

@trufae @dnakov I've updated the task list with new tasks. Will support dnakov's suggestion to keep litellm while dropping the size of r2ai to ~200-ish mb from the ~500-is mb size which is now.

I hope everyone is happy ^^

nitanmarcel commented 2 months ago

@dnakov @trufae any of you can test this? I'm afraid that my laptop isn't powerful enough and the only local model I was able to run didn't supported tools.

https://github.com/radareorg/r2ai/pull/48/commits/c1f0e2e75e114c7a7a58e4997e36864ee48eeb72

trufae commented 2 months ago

abandoned?

nitanmarcel commented 2 months ago

abandoned?

Nope will come back soon to it. Just taking a break since handling the functionary models drove me nuts xD

radareorg / r2ai

WIP: Auto mode rework #48