llms: Add Gemini Pro support + example

tmc commented 11 months ago

With Gemini pro now available, we should have integration + an example.

eliben commented 11 months ago

Happy to work on this within the next couple of weeks.

One interesting challenge is that Gemini are multimodal - they accept images as well as text as input

tmc commented 11 months ago

Yes, I have the start of image support in https://github.com/tmc/langchaingo/pull/361 that I'd like to extend to support ollama's multi-modal support (in addition to Gemini)

eliben commented 11 months ago

A key issue is that the llms.LLM interface has only []string for its prompts, and a lot of code depends on it. We need to either break it and rewrite existing code, or add another interface.

tmc commented 11 months ago

The chat interfaces supply a slightly richer surface area that we can extend but yes I agree we need a more flexible interface here. As we're pre-1.0 I'm open to considering a breaking change, it's pretty clear that text-only prompting is isn't sufficient.

eliben commented 11 months ago

I still want to spend some time studying the chat interfaces; they could be ripe for a significant refactoring. Chats shouldn't have different interfaces from LLMs - they should just build on top. A chat should just wrap an LLM with some history/context. Haven't found time for this yet, though - but hopefully soon.

tmc commented 11 months ago

Certainly open to collaborating on arriving at a good design!

xavidop commented 11 months ago

happy to help here! I am GDE in ML

eliben commented 11 months ago

Just a note that #465 was opened to specifically discuss the new interfaces needed to support multi-modal input. We'll design something that works for both OpenAI and Gemini and hopefully other models as well.

eliben commented 11 months ago

Once #465 is done, adding a Google Gemini backend should be straightforward. I think the following plan makes sense:

Finish #465
Add a new Google generative AI backend: this can either use the Google AI SDK (using an API key) or the Vertex SDK (using a GCP account) in the same backend
Once (3) works, remove the current vertexai backend since it's tuned for the old text-only PaLM model

xavidop commented 11 months ago

@eliben I would suggest to use VertexAI SDK since the Google AI SDK is not available in all the countries yet

eliben commented 11 months ago

@xavidop step (2) mentions that both can be supported via this interface. The user should be able to choose which to use when a client is created.

Would you like to help with this once the initial new interface from (1) is in place?

mrothroc commented 11 months ago

The hard part about this is that the two interfaces are quite different. You have to run them in parallel. I hacked something together with this approach and it looks like this: https://github.com/mrothroc/langchaingo/blob/3ecb9c417aa8777496fc378be35fa2271ea3f68b/llms/vertexai/internal/common/vertex_client.go#L27

(The code is a little disorganized since it just a hack, but you can see in general that you need both clients. The unit tests work though, so you can step through it to see it work.)

If anyone can suggest a better way to go about this, I'm open to it. I solved this problem in another private project by just using straight REST calls instead of the library, but I think that will lead to issues down the road.

eliben commented 11 months ago

The hard part about this is that the two interfaces are quite different. You have to run them in parallel. I hacked something together with this approach and it looks like this: https://github.com/mrothroc/langchaingo/blob/3ecb9c417aa8777496fc378be35fa2271ea3f68b/llms/vertexai/internal/common/vertex_client.go#L27

The legacy client for PaLM isn't needed anymore, since there's a new SDK for Vertex to interface with the Gemini models: https://pkg.go.dev/cloud.google.com/go/vertexai and it has a compatible interface to the Google generative AI SDK. But yes, having pointers to two potential clients and initializing just one of them based on passed options/parameters sounds like a reasonable approach overall.

mrothroc commented 11 months ago

The legacy client for PaLM isn't needed anymore, since there's a new SDK for Vertex to interface with the Gemini models: https://pkg.go.dev/cloud.google.com/go/vertexai and it has a compatible interface to the Google generative AI SDK.

I tried just using that to call text-bison but it doesn't appear to be supported via this client. So, if the expectation is that text-bison will still work, you have to run the two in parallel.

Also, it looks like the new API isn't complete. For example, unless I'm missing something, it doesn't appear to do embeddings.

eliben commented 11 months ago

I tried just using that to call text-bison but it doesn't appear to be supported via this client. So, if the expectation is that text-bison will still work, you have to run the two in parallel.

text-bison is the old PaLM model, and there's no real reason to invoke it now that Gemini is out. gemini-pro should be good now.

Also, it looks like the new API isn't complete. For example, unless I'm missing something, it doesn't appear to do embeddings.

Indeed, this discrepancy is unfortunate and hopefully temporary. In the meantime, it's OK to use a pointer to the PaLM client in the client type to answer embedding queries.

mrothroc commented 11 months ago

text-bison is the old PaLM model, and there's no real reason to invoke it now that Gemini is out. gemini-pro should be good now.

Switching models seems like a pretty big change. We tried just sending some of our production prompts used with text-bison to gemini-pro and the results were different. Also, text-bison is now officially supported, while gemini-pro is still in preview.

This is just my use case. It would definitely not work in our company. Maybe everyone else can live with that kind of change?

I guess this library is young enough that breaking changes are OK? If so, I'd be inclined to change the factory function that gets the LLM.

eliben commented 11 months ago

Status update here:

In #497 @mrothroc provided an initial implementation of the Model interface for GoogeAI (thanks @mrothroc !!)
498 is a followup to add embeddings and generalize options

The next step should be adding a parallel implementation in the same provider using the https://pkg.go.dev/cloud.google.com/go/vertexai/genai SDK - it provides largely the same functionality (there are some small differences), but uses GCP authentication instead of API keys. Since this SDK doesn't support embeddings yet, it will use the legacy PaLM client for embeddings but in a manner that should be transparent to users.

tmc / langchaingo

llms: Add Gemini Pro support + example #410

498 is a followup to add embeddings and generalize options