`/research` using perplexity api

0x4007 commented 1 year ago

https://blog.perplexity.ai/blog/introducing-pplx-api

Perplexity is optimized for q&a and live web research so perhaps it's a better backend for the ask command.

I use their consumer facing product and it's very effective.

Keyrxng commented 1 year ago

/start

ubiquibot[bot] commented 1 year ago

Skipping /start since no time labels are set to calculate the timeline

Keyrxng commented 1 year ago

@pavlovcik I'll get this done this afternoon

Keyrxng commented 1 year ago

We currently support Mistral 7B, Llama 13B, Code Llama 34B, Llama 70B, and Replit 3b. The API is conveniently OpenAI client-compatible for easy integration with existing applications.

These are the models available through the API, the replit one is the only one that isn't a chat model.

What model do you want to run? Looking at their UI you have claude, gpt-4 and perplexity to choose from but it's not clear right off the bat what model is perplexity from those options on the api reference docs, perhaps the perplexity model isn't available through the API? Or is it a white-labeled llama 70b?

It defaults to Mistral 7B so I'm assuming it's that and to just run with the default?

Keyrxng commented 1 year ago

Should have looked into this first, but their context limit isn't capable of handling our needs at least not until they increase the limits.

Where possible, we try to match the Hugging Face implementation. We are open to adjusting the API, so please reach out with feedback regarding these details.

[1] We drop any added system messages. For system prompting, per Mistral's recommendation, you can concatenate the system prompt with the first user message.
[2] We will be increasing the context length of codellama-34b-instruct to 16k tokens, and increasing the context length of mistral-7b-instruct to 32k tokens.

0x4007 commented 1 year ago

What if on issue view, we make /ask use perplexity? This could make sense because issue should be more research focused (refining the specification.) Given the limited context window, we can pass in the specification, and the sender comment only.

On the pull request review, we should use GPT-4 (with code interpreter?) so that we can pass in the diff, the conversation, and it can suggest direct code adjustments.

Perplexity Pros:

Perplexity has fast and high quality results from searching the internet.
It seems to be optimized specifically for asking questions, whereas ChatGPT has specializations including calculations and working with code.

GPT-4 Pros:

Large context length, which means we can include the entire conversation as context.

I'm using the free version of perplexity so I only have used the perplexity model. It seems to work quite well.

Keyrxng commented 1 year ago

I hear what you are saying Pav and I think until they up the context limit our hands are tied.

Review currently doesn't care about linked and conversation etc

Ask does consider all of the linked context as well as the current issue context which in my demos with miniscule issues, convos and prs it was eating up 4k tokens like it was nothing. The original scope of ask was that it would take as much context as possible to be able to provide better responses for research/issue brainstorming/planning.

What I had tried was just replaced the askGPT core api call and replaced it with perplexity, also swapped out the gptContextCall for perp but couldn't get any decent responses due to context window and formatting

I think GPT3.5 will perform better with the additional context than perp with the reduced context window but an improved model. Soon as that 16k window hits I think switching it out would be the best idea, although pricing adds a matter of perplexity to the AI feature suite. We'd need to allow for a switch of sorts so that if no perp API KEY is provided but an OpenAI one is then we use the right model

I took the 7 day free trial for the annual plan and done a bit of playing around myself, pretty good i must say.

GPT-4 (with code interpreter?)

Isn't code interpreter just a python plugin custom text splitter/parser?

I'm using the free version of perplexity so I only have used the perplexity model.

From what I gathered, I think they are using Mistral as their main model

0x4007 commented 1 year ago

I hear what you are saying Pav and I think until they up the context limit our hands are tied.

Add the spec and if the token counter is too high, then perhaps just the sender comment. I'll only know for sure how valuable the feature is when testing with real issues. But intuitively the more context we provide, the more relevant results I would expect.

Keyrxng commented 1 year ago

I'll give it a try and open the draft

What I'm troubled with is:

Current issue body is minimal and the actual relevant context lives within the linked 'original comment' or is spread over multiple issues/prs. The context could be the issue/pr body or it may be a comment within (Both scenarios are not uncommon to see)

I'll add the spec, count the links in the body, determine tokencount of spec and question, if it's 1/3 or more than that'll do? If it's less than 1/3 grab whatever the body is of the linked context and fire?

telegrammed you my api key for perp

ishaan-jaff commented 1 year ago

Hi @pavlovcik @Keyrxng - I believe we can make this easier I’m the maintainer of LiteLLM - we allow you to deploy an LLM proxy to call 100+ LLMs in 1 format - Perplexity, Bedrock, OpenAI, Anthropic etc https://github.com/BerriAI/litellm/tree/main/openai-proxy.

If this looks useful (we're used in production)- please let me know how we can help.

Usage

Perplexity request

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "perplexity/mistral-7b-instruct",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

gpt-3.5-turbo request

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

claude-2 request

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "claude-2",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

ubiquibot[bot] commented 1 year ago

Available commands


- /start: Assign the origin sender to the issue automatically.
- /stop: Unassign the origin sender from the issue automatically.
- /help: List all available commands.
- /autopay: Toggle automatic payment for the completion of the current issue.
- /query: Comments the users multiplier and address
- /multiplier: Set the bounty payout multiplier for a specific contributor, and provide the reason for why. 
  example usage: "/wallet @user 0.5 'Multiplier reason'"
- /allow: Set access control. (Admin Only)
- /wallet: <WALLET_ADDRESS | ENS_NAME>: Register the hunter's wallet address. 
  ex1: /wallet 0x0000000000000000000000000000000000000000
  ex2: /wallet vitalik.eth

@ishaan-jaff

Keyrxng commented 1 year ago

I believe we can make this easier

I appreciate you taking the time Ishaan, and while I'm but a lowly grunt, I do think that it's more than what we need at the moment although if there is need for more than a couple of models then it may be considered at that point.

For me personally, I'll likely make use of it in personal projects so again, appreciate the shout.

I'm working on improving tokenization before the call on our end as Mistral's way is unique and not provided by TikToken by default. It was over estimating by near double in most cases or under estimating by half in the others.

0x4007 commented 1 year ago

Agreed with @Keyrxng but thanks for letting us know about your product. I'm also curious to know how you found this issue @ishaan-jaff

I'm working on improving tokenization before the call on our end

Not sure if you're using the code I shared in the other thread under github-agents, but that is specifically for gpt tokenization. Different models I guess have different encoders.

Keyrxng commented 1 year ago

I am yes or at least drew from that initially

I'm hoping I can just string the entire convo as exampled by perp and mistral docs using the special characters it's been trained with then it's just a case of either

1. Extend an encoder including the new special chars
1. Run with the encoder that returns as close to the token count returned by mistral itself

0x4007 commented 1 year ago

I presume that all these commercial models have solutions for token counting, like OpenAI's tiktoken.

Keyrxng commented 1 year ago

Well, yeah it tends to vary from model to model depending on how that model was trained, what special characters were used etc

For instance in the context of Mistral instruct:

Chat template
The template used to build a prompt for the Instruct model is defined as follows:

<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]

Note that <s> and </s> are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.

NOTE
This format must be strictly respected, otherwise the model will generate sub-optimal outputs.

As reference, here is the format used to tokenize instructions during fine-tuning:

[START_SYMBOL_ID] + 
tok("[INST]") + tok(USER_MESSAGE_1) + tok("[/INST]") +
tok(BOT_MESSAGE_1) + [END_SYMBOL_ID] +
…
tok("[INST]") + tok(USER_MESSAGE_N) + tok("[/INST]") +
tok(BOT_MESSAGE_N) + [END_SYMBOL_ID]

NOTE
The function tok should never generate the EOS token, however FastChat (used in vLLM) sends the full prompt as a string which might lead to incorrect tokenization of the EOS token and prompt injection. Users are encouraged to send tokens instead as described above.

Keyrxng commented 1 year ago

The above was taken from the Mistral docs whereas the example below is from perplexity and there is clear differences between the two. I'm inclined to believe the Mistral docs above the Perp docs but still leaves me wondering slightly

The system message is prepended to the first user message:

<bos>[INST] <<SYS>>
System prompt
<</SYS>>

Instruction [/INST]
mistral-7b-instruct
Example chat:

[
  {
    "role": "user",
    "content": "Instruction"
  },
  {
    "role": "assistant",
    "content": "Model answer"
  },
  {
    "role": "user",
    "content": "Follow-up instruction"
  }
]
The tokenized chat:

<bos>[INST] Instruction [/INST]Model answer<eos> [INST] Follow-up instruction [/INST]

Keyrxng commented 1 year ago

reading your comment again I may have misunderstood at first.

Perp uses the says API structure as OpenAI so it returns the tokens used for input, output and both but it's after the fact obviously.

some shit QA:

https://github.com/ubq-testing/bot-ai/issues/14#issuecomment-1775751229

Keyrxng commented 1 year ago

So the underlying isn't Tiktoken it's google SentencePieceProcessor, tried to get something close with TikToken but no joy. I've had to get the js wrapper for SPP but the prompt tokenization is just about spot on

some more shit QA:

https://github.com/ubq-testing/bot-ai/issues/14#issuecomment-1779056275

ubiquity / ubiquibot