pinecone-io / canopy

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
https://www.pinecone.io/
Apache License 2.0
923 stars 112 forks source link

[Feature] Support for new OpenAI multimodality #160

Open sivang opened 8 months ago

sivang commented 8 months ago

Is this your first time submitting a feature request?

Describe the feature

Enabled Canopy to also allow uploading for images, or indexing of images, for the RAG operation. This also means that inference API from Canopy should be able to respond with a proper REST format (JSON + base64 enc for bin?) to return said produced images.

Describe alternatives you've considered

Considering creating something like this myself.

Who will this benefit?

Multimodal RAG users who wish to use multimodality within the ChatCompletions API from OpenAI , for example.

Are you interested in contributing this feature?

yes, but I've just discovered canopy 10 minutes ago :)

Anything else?

No response

miararoy commented 8 months ago

Hi @sivang !

Sounds great, and also something that we have on our roadmap i hope we will publish soon.

What would be the interface for loading images?

Looking at todays "Document" model:

class Document(BaseModel):
    id: str = Field(description="The document id.")
    text: str = Field(description="The document text.")
    source: str = Field( default="", description="...")
    metadata: Metadata = Field(default_factory=dict, description="...")

What would you like to see? an extension of the current model? a new model?

Thanks!, Roy from Pinecone

igiloh-pinecone commented 8 months ago

Thanks @sivang sharing this idea!
As @miararoy has mentioned, this is definitely on the roadmap.

We're currently deliberating between two approaches:

  1. Create distinct KnowledgeBase classes, each designed for a specific data type. For instance, DocumentsKnowledgeBase for documents, ImagesKnowledgeBase for images, etc. To add a new type, you'd need to define a new data model and a corresponding KnowledgeBase subclass for managing it.
  2. Make the KnowledgeBase fully robust - you define an arbitrary data schema on init, and the KnowledgeBase handles each of the data fields (texts, images, etc) based on that schema.

Both approaches have their own pros and cons in terms of user experience and ease of use. We're keen to hear the community's feedback on which API they find more practical and user-friendly.

sivang commented 8 months ago

thank you @miararoy and @igiloh-pinecone for being so responsive!

What I envisaged is a common data structure (so I guess a new model!) that would allow me a similar mental model as I use now with OpenAI ChatGPT WebUI, such that

So for example, I could have a Trail or Conversation or whatever we choose to name it object, which would contains the specialized objects for each data items regardless of its modality. In a sense I guess I'm thinking along the lines of https://en.wikipedia.org/wiki/MIME , but obviously I'm expecting the user friendliness of Python objects and collections.

How does that sound? Does it make sense in view of the roadmap and general architecture? (and btw, I'm happy to help contribute the feature)

Come to think of it, we might be able to achieve this (if not already implemented) using LangChain's https://js.langchain.com/docs/api/chains/classes/ConversationChain or model Canopy's one on it?

sivang commented 8 months ago

Thanks @sivang sharing this idea! As @miararoy has mentioned, this is definitely on the roadmap.

We're currently deliberating between two approaches:

  1. Create distinct KnowledgeBase classes, each designed for a specific data type. For instance, DocumentsKnowledgeBase for documents, ImagesKnowledgeBase for images, etc. To add a new type, you'd need to define a new data model and a corresponding KnowledgeBase subclass for managing it.
  2. Make the KnowledgeBase fully robust - you define an arbitrary data schema on init, and the KnowledgeBase handles each of the data fields (texts, images, etc) based on that schema.

Both approaches have their own pros and cons in terms of user experience and ease of use. We're keen to hear the community's feedback on which API they find more practical and user-friendly.

I think the underlying implementation is important, but also is the ability to create , or use said KnowledgeBases as one when providing context for the LLM. I admit that I haven't dived into the docs that far, but when asking for a retrieval, do I need to specify which KnowledgeBases to use and can I combine a few to allow OpenAI to kick DALLE generation on it's own accord? (as it done through the ChatGPT webUi ?)