Open caldeirav opened 1 year ago
The gpt4all-datalake has provided the API for contributed the data.
https://api.gpt4all.io/v1/ingest/chat
{ "source": "gpt4all-chat", "submitter_id": "EliteHacker#42", "agent_id": "gpt4all-j-v1.2-jazzy", "ingest_id": "string", "conversation": [ { "content": "Hello, how can I assist you today?", "role": "assistant", "rating": "negative", "edited_content": "Hello, how may I assist you today?" }, { "content": "Write me python code to contribute data to the GPT4All Datalake!", "role": "user" } ], "prompt_template": "string" }
I compared different vector databases, Weaviate, Pinecone and Chroma Weaviate vector database has native REST API for creating objects, very convenient, worth to try. https://weaviate.io/developers/weaviate/api/rest/batch
For search, Weaviate's GraphQL API are very useful for integration https://weaviate.io/developers/weaviate/api/graphql
Data product owner can easily submit their data to Weaviate vector database.
Hi @caldeirav,
I'd like to install Weaviate Vector database on Red Hat AI and show examples how to send data to Weaviate. What do you reckon?
Many thanks, Neo
@neoxu999 Weaveviate looks like a good candidate - I think the key is to ensure we can integrate the vector database with our MLOps automation first and foremost and once this is successful, we can start looking at data contributions and data tracing / lineage requirements in details first.
@neoxu999 Do you think it is possible to introduce Weaveviate into the Data Mesh pattern deployment now? As we are installing a new instance, we can then start to run simple examples such as the ones in the OpenAI playbook, before we introduce our own training pipeline.
Reference: https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases
@caldeirav good to know we have a new instance. Yes, I can try the OpenAI playbook before installing Weaveviate on Data Mesh Pattern.
Date mesh pattern should provide a way for data product owner to contribute curated data for LLM training. A good approach and reference is the datalake approach for gpt4all:
https://github.com/nomic-ai/gpt4all-datalake