Closed nitanmarcel closed 2 weeks ago
@dnakov :)
Omg this pr couldnt be bigger
Omg this pr couldnt be bigger
hihi. I'm not done
Can someone tell me what's a token limit? Because I don't get any =)))
Oh, this one -_-. I was close enough tho
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, you requested 8226 tokens (3983 in the messages, 147 in the functions, and 4096 in the
completion). Please reduce the length of the messages, functions, or completion.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}```
@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch
All of these can just be served via a single litellm call like in ui/chat.py
@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch
All of these can just be served via a single litellm call like in ui/chat.py
I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response
Would be something like:
@process_response(processor=function_to_convert_response)
def unsupported_model_call(...):```
Would be something like:
@process_response(processor=function_to_convert_response) def unsupported_model_call(...):```
Tho it implies that it supports the same tool format. Can create a pre_processor argument for that too
Can someone tell me what's a token limit? Because I don't get any =)))
Anyway, I have this to figure out. The chunking of big resilts works pretty fine almost
@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch All of these can just be served via a single litellm call like in ui/chat.py
I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response
@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.
@nitanmarcel before you get too far.. Re-Implement OpenAI Re-implement Anthropic Re-Implement Llama Re-Implement Bedrock Re-Implement Groq Re-Implement Google Re-Implement NousResearch All of these can just be served via a single litellm call like in ui/chat.py
I think @trufae said something about not wanting litellm implemented. Which is extra fine by me since I can transform the process function in a decorator and have it hold an argument to pre-process the response
@trufae We should resolve this as we're basically going to be replicating litellm. They've done all the model request/response parsing in there and consolidated all to the openai spec.
Took a deeper look at what instructor (a new library we use here) does under the hood to support other llm's tools and it has everything I need to easily implement the tools. Parsing the raw response needs to be done from scratch but I have the old code for that
Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.
Yeah, I've used instructor. But with litellm, you don't need to parse anything raw. I have no interest in maintaining transformations for so many models when it already exists, this isn't the primary point of this library.
From what I've seen so far you still have to raw parse things, but there's the same response for all llms
Yes, so you're only parsing 1 thing vs 1 for each API/model
Yes, so you're only parsing 1 thing vs 1 for each API/model
We'll have to wait for an answer from @trufae.
Yes, so you're only parsing 1 thing vs 1 for each API/model
Got the green. The other green not the high green
Yes, so you're only parsing 1 thing vs 1 for each API/model
Got the green. The other green not the high green
What does that mean?
Yes, so you're only parsing 1 thing vs 1 for each API/model
Got the green. The other green not the high green
What does that mean?
I can use llmlite :)
Yes, so you're only parsing 1 thing vs 1 for each API/model
Got the green. The other green not the high green
What does that mean?
doesn''t work
AttributeError: 'Delta' object has no attribute 'role'
ah the version was too old, but still:
ModelResponse(
│ id='chatcmpl-A730rEKe2r4dS5mn2BcCWYT9ILvQw',
│ choices=[
│ │ StreamingChoices(
│ │ │ finish_reason=None,
│ │ │ index=0,
│ │ │ delta=Delta(
│ │ │ │ refusal=None,
│ │ │ │ content=None,
│ │ │ │ role='assistant',
│ │ │ │ function_call=None,
│ │ │ │ tool_calls=[
│ │ │ │ │ ChatCompletionDeltaToolCall(
│ │ │ │ │ │ id=None,
│ │ │ │ │ │ function=Function(arguments='{\n', name=None),
│ │ │ │ │ │ type='function',
│ │ │ │ │ │ index=0
│ │ │ │ │ )
│ │ │ │ ]
│ │ │ ),
│ │ │ logprobs=None
│ │ )
│ ],
│ created=1726243242,
│ model='gpt-4',
│ object='chat.completion.chunk',
│ system_fingerprint=None
)```
I do parse the funtions myself now so maybe this is my issue
I do parse the funtions myself now so maybe this is my issue
Yep, I forgot how generators work 😅
my comments on litellm:
so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.
How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?
It is commercial, I think they got some VC funding, although it's at least MIT licensed. It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc It doesn't actually do anything to the prompts, you still have the same amount of control. Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.
How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?
I'm thinking we can just use their convention <provider>/model_name
for any API models so this way we don't have to constantly update model names. So -m openai/o1-preview
would work even if it's not on the list.
my comments on litellm:
- it feels so comercial by checking the website
- im afraid of the amount of deps to make it too huge when packaged
- we lose control on how we build the prompts
- doesnt support llama afaik
so imho i would like to keep control on the llama side with chromadb and the prompt structure thing, at least as a separate codebase, even if its ugly i think this code gives us more control. let me know if i misunderstood anything from litellm.
How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?
IAs long as we don't use their UI or their moderation tools we are covered by the MIT license.
2, doesn't happen since we use the conversation wrapper, it returns the same format as OpenAI for all endpoints.
And we can use llama separately. About the size, it uses extras so in our case it only downloads the deps we need
Downloading litellm-1.45.0-py3-none-any.whl.metadata (32 kB)
Collecting aiohttp (from litellm)
Downloading aiohttp-3.10.5.tar.gz (7.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.5/7.5 MB 4.3 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting click (from litellm)
Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting importlib-metadata>=6.8.0 (from litellm)
Downloading importlib_metadata-8.5.0-py3-none-any.whl.metadata (4.8 kB)
Collecting jinja2<4.0.0,>=3.1.2 (from litellm)
Downloading jinja2-3.1.4-py3-none-any.whl.metadata (2.6 kB)
Collecting jsonschema<5.0.0,>=4.22.0 (from litellm)
Downloading jsonschema-4.23.0-py3-none-any.whl.metadata (7.9 kB)
Collecting openai>=1.45.0 (from litellm)
Downloading openai-1.45.0-py3-none-any.whl.metadata (22 kB)
Collecting pydantic<3.0.0,>=2.0.0 (from litellm)
Downloading pydantic-2.9.1-py3-none-any.whl.metadata (146 kB)
Collecting python-dotenv>=0.2.0 (from litellm)
Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting requests<3.0.0,>=2.31.0 (from litellm)
Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tiktoken>=0.7.0 (from litellm)
Downloading tiktoken-0.7.0.tar.gz (33 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting tokenizers (from litellm)
Downloading tokenizers-0.20.0.tar.gz (337 kB)
On the other side
It is commercial, I think they got some VC funding, although it's at least MIT licensed. It's not a crazy amount of deps, we'd want most of them anyway -- like the anthropic, openai and google libraries, pydantic, etc It doesn't actually do anything to the prompts, you still have the same amount of control. Yes, it doesn't do anything with running models locally, so doesn't help there. But if we expose a openai-compatible endpoint, we can still use the same completion code.
How are you planning to support litellm, if we will now support more models and reduce the logic handling all those models, the interface for the user will be the same?
I'm thinking we can just use their convention
<provider>/model_name
for any API models so this way we don't have to constantly update model names. So-m openai/o1-preview
would work even if it's not on the list.
I don't think we even need to do constantly update our models, at least in auto only the provider is set. About the
So, litellm or not, these can be done manually. + We can freely use parts of litellm in our code due to the dual license they use
@trufae @dnakov I've updated the task list with new tasks. Will support dnakov's suggestion to keep litellm while dropping the size of r2ai to ~200-ish mb from the ~500-is mb size which is now.
I hope everyone is happy ^^
@dnakov @trufae any of you can test this? I'm afraid that my laptop isn't powerful enough and the only local model I was able to run didn't supported tools.
https://github.com/radareorg/r2ai/pull/48/commits/c1f0e2e75e114c7a7a58e4997e36864ee48eeb72
abandoned?
abandoned?
Nope will come back soon to it. Just taking a break since handling the functionary models drove me nuts xD
Checklist
- [x] Rewrite most of the code- [X] Re-Implement OpenAI- [ ] Re-implement Anthropic- [ ] Re-Implement Llama- [ ] Re-Implement Bedrock- [ ] Re-Implement Groq- [ ] Re-Implement Google- [ ] Re-Implement NousResearch- [ ] Implement chromadb (Add radare documentation as knowledge base, promt the ai to return "memories" that can be saved and used later.