Open 0x4007 opened 1 year ago
/start
Skipping /start
since no time labels are set to calculate the timeline
@pavlovcik I'll get this done this afternoon
We currently support Mistral 7B, Llama 13B, Code Llama 34B, Llama 70B, and Replit 3b. The API is conveniently OpenAI client-compatible for easy integration with existing applications.
These are the models available through the API, the replit one is the only one that isn't a chat model.
What model do you want to run? Looking at their UI you have claude, gpt-4 and perplexity to choose from but it's not clear right off the bat what model is perplexity from those options on the api reference docs, perhaps the perplexity model isn't available through the API? Or is it a white-labeled llama 70b?
It defaults to Mistral 7B so I'm assuming it's that and to just run with the default?
Should have looked into this first, but their context limit isn't capable of handling our needs at least not until they increase the limits.
Where possible, we try to match the Hugging Face implementation. We are open to adjusting the API, so please reach out with feedback regarding these details.
[1] We drop any added
system
messages. For system prompting, per Mistral's recommendation, you can concatenate the system prompt with the first user message.[2] We will be increasing the context length of
codellama-34b-instruct
to 16k tokens, and increasing the context length ofmistral-7b-instruct
to 32k tokens.
What if on issue view, we make /ask
use perplexity? This could make sense because issue should be more research focused (refining the specification.) Given the limited context window, we can pass in the specification, and the sender comment only.
On the pull request review, we should use GPT-4 (with code interpreter?) so that we can pass in the diff, the conversation, and it can suggest direct code adjustments.
I'm using the free version of perplexity so I only have used the perplexity
model. It seems to work quite well.
I hear what you are saying Pav and I think until they up the context limit our hands are tied.
Review currently doesn't care about linked and conversation etc
Ask does consider all of the linked context as well as the current issue context which in my demos with miniscule issues, convos and prs it was eating up 4k tokens like it was nothing. The original scope of ask was that it would take as much context as possible to be able to provide better responses for research/issue brainstorming/planning.
What I had tried was just replaced the askGPT core api call and replaced it with perplexity, also swapped out the gptContextCall for perp but couldn't get any decent responses due to context window and formatting
I think GPT3.5 will perform better with the additional context than perp with the reduced context window but an improved model. Soon as that 16k window hits I think switching it out would be the best idea, although pricing adds a matter of perplexity to the AI feature suite. We'd need to allow for a switch of sorts so that if no perp API KEY is provided but an OpenAI one is then we use the right model
I took the 7 day free trial for the annual plan and done a bit of playing around myself, pretty good i must say.
GPT-4 (with code interpreter?)
Isn't code interpreter just a python plugin custom text splitter/parser?
I'm using the free version of perplexity so I only have used the perplexity model.
From what I gathered, I think they are using Mistral as their main model
I hear what you are saying Pav and I think until they up the context limit our hands are tied.
Add the spec and if the token counter is too high, then perhaps just the sender comment. I'll only know for sure how valuable the feature is when testing with real issues. But intuitively the more context we provide, the more relevant results I would expect.
I'll give it a try and open the draft
What I'm troubled with is:
I'll add the spec, count the links in the body, determine tokencount of spec and question, if it's 1/3 or more than that'll do? If it's less than 1/3 grab whatever the body is of the linked context and fire?
telegrammed you my api key for perp
Hi @pavlovcik @Keyrxng - I believe we can make this easier I’m the maintainer of LiteLLM - we allow you to deploy an LLM proxy to call 100+ LLMs in 1 format - Perplexity, Bedrock, OpenAI, Anthropic etc https://github.com/BerriAI/litellm/tree/main/openai-proxy.
If this looks useful (we're used in production)- please let me know how we can help.
Perplexity request
curl http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "perplexity/mistral-7b-instruct",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
gpt-3.5-turbo request
curl http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
claude-2 request
curl http://0.0.0.0:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-2",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
- /start: Assign the origin sender to the issue automatically.
- /stop: Unassign the origin sender from the issue automatically.
- /help: List all available commands.
- /autopay: Toggle automatic payment for the completion of the current issue.
- /query: Comments the users multiplier and address
- /multiplier: Set the bounty payout multiplier for a specific contributor, and provide the reason for why.
example usage: "/wallet @user 0.5 'Multiplier reason'"
- /allow: Set access control. (Admin Only)
- /wallet: <WALLET_ADDRESS | ENS_NAME>: Register the hunter's wallet address.
ex1: /wallet 0x0000000000000000000000000000000000000000
ex2: /wallet vitalik.eth
@ishaan-jaff
I believe we can make this easier
I appreciate you taking the time Ishaan, and while I'm but a lowly grunt, I do think that it's more than what we need at the moment although if there is need for more than a couple of models then it may be considered at that point.
For me personally, I'll likely make use of it in personal projects so again, appreciate the shout.
I'm working on improving tokenization before the call on our end as Mistral's way is unique and not provided by TikToken by default. It was over estimating by near double in most cases or under estimating by half in the others.
Agreed with @Keyrxng but thanks for letting us know about your product. I'm also curious to know how you found this issue @ishaan-jaff
I'm working on improving tokenization before the call on our end
Not sure if you're using the code I shared in the other thread under github-agents, but that is specifically for gpt tokenization. Different models I guess have different encoders.
I am yes or at least drew from that initially
I'm hoping I can just string the entire convo as exampled by perp and mistral docs using the special characters it's been trained with then it's just a case of either
I presume that all these commercial models have solutions for token counting, like OpenAI's tiktoken.
Well, yeah it tends to vary from model to model depending on how that model was trained, what special characters were used etc
For instance in the context of Mistral instruct:
Chat template
The template used to build a prompt for the Instruct model is defined as follows:
<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]
Note that <s> and </s> are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.
NOTE
This format must be strictly respected, otherwise the model will generate sub-optimal outputs.
As reference, here is the format used to tokenize instructions during fine-tuning:
[START_SYMBOL_ID] +
tok("[INST]") + tok(USER_MESSAGE_1) + tok("[/INST]") +
tok(BOT_MESSAGE_1) + [END_SYMBOL_ID] +
…
tok("[INST]") + tok(USER_MESSAGE_N) + tok("[/INST]") +
tok(BOT_MESSAGE_N) + [END_SYMBOL_ID]
NOTE
The function tok should never generate the EOS token, however FastChat (used in vLLM) sends the full prompt as a string which might lead to incorrect tokenization of the EOS token and prompt injection. Users are encouraged to send tokens instead as described above.
The above was taken from the Mistral docs whereas the example below is from perplexity and there is clear differences between the two. I'm inclined to believe the Mistral docs above the Perp docs but still leaves me wondering slightly
The system message is prepended to the first user message:
<bos>[INST] <<SYS>>
System prompt
<</SYS>>
Instruction [/INST]
mistral-7b-instruct
Example chat:
[
{
"role": "user",
"content": "Instruction"
},
{
"role": "assistant",
"content": "Model answer"
},
{
"role": "user",
"content": "Follow-up instruction"
}
]
The tokenized chat:
<bos>[INST] Instruction [/INST]Model answer<eos> [INST] Follow-up instruction [/INST]
reading your comment again I may have misunderstood at first.
Perp uses the says API structure as OpenAI so it returns the tokens used for input, output and both but it's after the fact obviously.
some shit QA:
So the underlying isn't Tiktoken it's google SentencePieceProcessor, tried to get something close with TikToken but no joy. I've had to get the js wrapper for SPP but the prompt tokenization is just about spot on
some more shit QA:
https://blog.perplexity.ai/blog/introducing-pplx-api
Perplexity is optimized for q&a and live web research so perhaps it's a better backend for the ask command.
I use their consumer facing product and it's very effective.