langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.26k stars 14.73k forks source link

Add Google PaLM API #4681

Closed yil532 closed 7 months ago

yil532 commented 1 year ago

Feature request

Hi team,

I am a developer relations engineer working at Google on PaLM API. I want to participate and contribute to adding Google PaLM potentially to LangChain. How is our current dev stage of adding Google PaLM API?

Motivation

Better user experience with PaLM API :)

Your contribution

Still need discussion, might be PRs, design discussions, or others.

UmerHA commented 1 year ago

Hi @yil532!

There is a PaLM integration: https://github.com/hwchase17/langchain/blob/master/langchain/llms/google_palm.py which seems to be 2 weeks old. The model used is text-bison-001.

However, neither your embedding model textembedding-gecko nor your chat model chat-bison-001 are implemented yet.

I just finished implementing Reflexion (https://github.com/hwchase17/langchain/pull/4737), so have a bit of time. Would love to implement the PaLM embedding & chat model, if you give me an API key :)

Jflick58 commented 1 year ago

@yil532 there is an issue I raised that the Vertex AI version of PaLM is not supported. I'm working on a PR to implement that functionality. #4532

daranable commented 1 year ago

@yil532 I got access to the palm API the other day and have been trying to use the implementation listed above. I haven't been able to get it working correctly. The chat endpoint that was implemented doesn't work at all. I've had to modify my local install of langchain to get it working at all.

Here is a list of issues that I have had varying levels of success in fixing locally:

I am still learning langchain myself, so for all I know these issues could be due to me doing something wrong. But it seems like there aren't many people making use of palm API yet in langchain from what I've been able to find searching.

cipher982 commented 1 year ago

I am having the same issues as @daranable. The chat model was working for me last week but seems to be broken after updating all my libraries, I specifically get an empty response (no candidates). After fixing that it still appears to be almost mostly useless as an agent llm, ignoring the instructions in the context and preceding user message.

After pasting in all my formatted messages into the makersuite web interface I still get the same problems (just responds in natural language, no json actions/answers), so I think it's just a limitation of the chat alignment (text-bison works fine). I think this could also be due to all of the prompt engineering up to this point designed around the OpenAI models. I have been making iterative changes to the context and user messages and sometimes I will find a specific phrasing that appears to lock chat-bison into my required json formatting structure, but it is very flaky. While the original langhchain/modified prompts we use have a near perfect formatting accuracy for gpt35/4.

The best method I have found so far is making the input messages as short and focused as possible. With my company we have built up some fairly lengthy prompts with various tooling and context information, and including all of that leads bison to lose focus on the formatting instructions.

Here is an example of what I get returned over half the time. The model just responding to my question while ignoring all instructions:

image

sjyangkevin commented 1 year ago

I have encountered the same issue as @daranable regarding the quota limit when working with embedding. This issue is because the prediction endpoint only accepts 20 requests per minute (in my case). I got this issue when I was working with the Code Understanding use case when executing the indexing. I am using the chroma vector store as shown below.

db = Chroma.from_documents(docs, embeddings)

My embeddings is basically an instance of VertexAIEmbeddings. If you trace the code stack, you will find that it eventually calls embed_documents method defined in the VertexAIEmbeddings, as shown below.

embeddings = self._embedding_function.embed_documents(list(texts))

So, one of the work around is to add a rate_limit to limit the amount of calls to the endpoint. I created a wrapper class on top of VertexAIEmbeddings and override the embed_documents method. The drawback of this method is that the process will be slow down a lot, since we only send limited amount of requests to the endpoint.

# Vertex AI prediction quota is 20 / minute
def rate_limit(max_per_minute: int = 20):
    period = 60 / max_per_minute
    print('Waiting')
    while True:
        before = time.time()
        yield
        after = time.time()
        elapsed = after - before
        sleep_time = max(0, period - elapsed)
        if sleep_time > 0:
            print('.', end='')
            time.sleep(sleep_time)

class RateLimitedVertexAIEmbeddings(VertexAIEmbeddings):

    def embed_documents(self, texts: List[str], batch_size: int = 5) -> List[List[float]]:
        """Embed a list of strings. Vertex AI currently
        sets a max batch size of 5 strings. Also, introduce
        rate limit, for large document, solve the quota exceed issue.

        Args:
            texts: List[str] The list of strings to embed.
            batch_size: [int] The batch size of embeddings to send to the model

        Returns:
            List of embeddings, one for each text.
        """
        limiter = rate_limit()

        embeddings = []
        for batch in range(0, len(texts), batch_size):
            text_batch = texts[batch : batch + batch_size]
            embeddings_batch = self.client.get_embeddings(text_batch)
            embeddings.extend([el.values for el in embeddings_batch])
            next(limiter)

        return embeddings

embeddings = RateLimitedVertexAIEmbeddings(disallowed_special=())

I didn't get the quota limit issue after this change.. image

image

Hope it can resolve the issue. I am also quite new to LangChain, and still learning ..

pmcray commented 1 year ago

Where is the VertexAIEmbeddings class defined or imported from? Or am I missing something? @sjyangkevin

sjyangkevin commented 1 year ago

It is imported from langchain.embeddings.vertexai @pmcray. You can refer to the document.
My LangChain version is 0.0.187

pmcray commented 1 year ago

Thanks, @sjyangkevin. I had just spotted there might be some helpful stuff in the Langchain docs,. I was using Langchain and LlamaIndex earlier when I was trying to do things with the GPT API, but we are a Google shop so have to focus on PaLM!

sjyangkevin commented 1 year ago

@pmcray Hope it helps. I am in the same situation as well, and I am still experimenting and learning the capabilities and limitations of PaLM.

pmcray commented 1 year ago

@sjyangkevin I am trying to pass Open API specs in as context (and a YAML file if I am trying to do few shots), but things get too big for the context very quickly. Seems the token limit is about the equivalent of ~70 kB and has Bill Gates didn't say, 70 kB is not enough for any purpose. I am hoping that embeddings might be a way forward, but am unsure. It's not like I want to compare documents, I want to generate new ones based on examples. I can fine tune, of course, on as much data as I like, but that doesn't help if I can't say "Generate a YAML file in this format based on this Open API spec"!

sjyangkevin commented 1 year ago

@pmcray Yeah, the token limits is another issue that restricts the way we can do prompting. From my understanding, your case is more like content generation. I think embedding is mainly used for search or doing some comparison. Probably, fine-tuning might be the way to go but that probably requires ~100 of examples to build the proof of concept. Since now the PaLM API is in pre-GA, hopefully we can get more quotas or more capabilities to do things when it's in GA.

magnussentio commented 1 year ago

Hi all, apologies on the noob question. I also have access to the PaLM API. Is there a simple example to getting a response such as that of the langchain openai getting started tutorial?

sjyangkevin commented 1 year ago

@magnussentio probably you're lookng for this? Google Cloud Platform Vertex AI PaLM

bgiesbrecht commented 1 year ago

@sjyangkevin @Jflick58 and others, you folks are awesome. Thank you for your help and contributions. It is because of this that I have been able to take cutting edge tech and make it work for my situation. Thank you for having shoulders I can stand on.

Vansh1190 commented 1 year ago

for call at Jul 8 10:33:28 PM { Jul 8 10:33:28 PM code: 9, Jul 8 10:33:28 PM details: 'User location is not supported for the API use.', Jul 8 10:33:28 PM metadata: Metadata { internalRepr: Map(0) {}, options: {} } Jul 8 10:33:28 PM }

getting error in render when hosting my Palm API i hosted a my project on Render after building a simple backend that taken message in the body of requsrt but i am getting an error in the render, when i make a request to my backend. please help me

sjyangkevin commented 1 year ago

@Vansh1190 Is your region supported by Vertex AI in the available regions?

Vansh1190 commented 1 year ago

@Vansh1190 Is your region supported by Vertex AI in the available regions?

Yes sir , Event it is running very good on my local machine and when I hosted it on render, it gives me This - https://github.com/hwchase17/langchain/issues/4681#issuecomment-1627429876

Error

Vansh1190 commented 1 year ago

i think i found the solution,

Note: The PaLM API is currently in public preview. Production applications are not supported at this stage. https://developers.generativeai.google/guide

haseeb-heaven commented 11 months ago

We need to add PALM 2 (https://developers.generativeai.google/tutorials/text_quickstart) and not Vertex AI support because Vertex AI is added but PALM 2 is different

dosubot[bot] commented 8 months ago

Hi, @yil532,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, this issue is a feature request to add the Google PaLM API to LangChain, motivated by the desire to improve user experience. There have been discussions about the limitations of the PaLM API, including issues with the chat model, text model, and embedding rate limits, as well as suggestions to add support for PaLM 2.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!