Open Rho-9-Official opened 3 months ago
👋 Hi @Rho-9-Official, Issues is only for reporting a bug/feature request. Please read documentation before raising an issue https://rengine.wiki For very limited support, questions, and discussions, please join reNgine Discord channel: https://discord.gg/azv6fzhNCE Please include all the requested and relevant information when opening a bug report. Improper reports will be closed without any response.
Hi @Rho-9-Official
reNgine 2.1.0 is just released with Ollama support. You can install LLMs locally now.
Ok, words aren't my strength, and I know you have better things to do than try to understand my gibberish😂...i'mma use llama2 to help me out a bit.
I was just thinking, using drop-in APIs would be a massive strength. Like, if I wanted to run rengine
on a small, lightweight laptop, I’d be waiting a month for an AI-generated report!Likewise, sometimes we can't afford to use OpenAI API(I was under the impression that costs money), that's why I also run LM Studio—amazing software. It allows for a bit more flexibility in models, including the option to use LLaMA2 uncensored models, which won’t argue with being asked to generate vulnerability reports. Plus, it lets me run those models on a much more powerful machine and access it remotely using OpenAI's API module by changing the baseURL.
Here's an example of how you can use the API:
# Chat with an intelligent assistant in your terminal
from openai import OpenAI
# Point to the local server
client = OpenAI(base_url="http://localhost:8081/v1", api_key="lm-studio")
history = [
{"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."},
{"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."},
]
while True:
completion = client.chat.completions.create(
model="customized-models/zephyr-7B-beta-GGUF",
messages=history,
temperature=0.7,
stream=True,
)
new_message = {"role": "assistant", "content": ""}
for chunk in completion:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
new_message["content"] += chunk.choices[0].delta.content
history.append(new_message)
# Uncomment to see chat history
# import json
# gray_color = "\033[90m"
# reset_color = "\033[0m"
# print(f"{gray_color}\n{'-'*20} History dump {'-'*20}\n")
# print(json.dumps(history, indent=2))
# print(f"\n{'-'*55}\n{reset_color}")
print()
history.append({"role": "user", "content": input("> ")})
This setup makes it super flexible and powerful for various use cases!
You'd obviously end up needing to add the functionality to change between OpenAI and Drop in API....I'd change it myself,since I'd just always use my drop in..... but....I haven't found the files for that yet.
I agree with you, this can be looked upon. There is another feature request for Groq, so think I can integrate this together.
I'd be more than happy to even try some testing to figure out which models work best, though, I think I'll hedge my bets on the llama2 7b uncensored model from TheBloke being the winner.
oh, I should also probably test to see if the smallest models can even understand the data, and I really want to check out some of those summarizer models, just to cover the obvious.
Is there an existing feature or issue for this?
Expected feature
so, LM Studio is a self hosting API server option for LLMs, and it's actually built to act as a drop in replacement for OpenAI API.
Alternative solutions
I would give other solutions, but honestly, LM Studio seems to be the only one that comes to mind that would be perfect, and 100% free.
Anything else?
No response