Sarvadnya (सर्वज्ञ), an All-Knowing Chatbot!!

Chatbots can be real WoW!! The recent evidence is: ChatGPT. Now that they are more human-like with the latest LLMs (Large Language Models). But these LLMs are Pretrained on their own (HUGE) data. Mere mortals don't have any ways ($$, time, expertise) to train own LLMs. RAG and/or Fine-tuning is the way out for Domain Adaptation ie. LLMs answering on your corpus. This repo is a collection of various PoCs (Proof-of-Concepts) to interface custom data using LLMs.

A few other topics are (or can be) part of this repo is to build

Indic-languages models, some notes here
3D World Simulations, Agents, some notes here
Knowledge Graphs Generation, some notes here
Agents, some notes here
Drones, UAV Image Processing, Shynakshi here
Floor Plan Segmentation here

What?

PoCs Projects

Prep chatbots of various modalities, use cases and domains, diff datasets
Prep videos, write Medium Posts (GDE/TH), LinkedIn posts, Youtube channel

Modes

Retrieval Augmented Generation (RAG) on own data
Fine-tuning LLMs with own data using LoRA etc

RAG

When?: {less, streaming, private} data and less {compute, money, expertise}
What?:
- on knowledge graphs, more grounding
- tabular financial data, representation and similarity
- midcurveNN Geometric serialization and retrieval
- active loop idea of fine-tuning your data
- Langchain and Llamaindex with any new LLM

Fine-Tuning

When? Sufficient curated date is available, not a whole lot though, in a batch (not running) state
What: Instead of unstructured text (input prompts) to unstructured text (output response), more value is in prompt to structured output, such as :
- text2json: many enterprises such as financial companies.
- text2cypher: for graph databases, from Neo4j, like Langchain implementation by Tomaz Britanic
- text2SQL: classical case, many pro solutions available, study them, follow them, for other QLs
- text2Manim: Maths Animation, dataset available, see if generated video can be shown in the same streamlit page
- text23DJS: Good for 3D+LLM+Agents like Metamorph from Nvidia, Geometry or shape representation as text, is the key
- textGraph2textGraph: MidcurveNN if we get Graph representation as text, right.
Here, key would be robust post-processing and evaluation as the response needs to be near perfect, no scope of relaxation even in syntax or format.

Tech Stacks

Enterprise: Google Doc AI, Vertex AI, Microsoft Azure Language AI Services
Open Source: Langchain (Serve/Smith/Graph), HuggingFace, Streamlit for UI

Bottom-line

Not looking for Success, but Wonder!!
तमसो मा ज्योतिर्गमय : From Dark (hidden in text data) to Light (insights)

Folks to Follow

Abhinav Kimothi, RAG Expert: LinkedIn, Projects Portfolio, Website, Medium, LinkedIn Articles, LinekdIn Posts, Company
Pradip Nichite, Freelancing Expert: LinkedIn, Projects Portfolio, Blog, Youtube, LinekdIn Posts, Company
Sahar Mor: LinkedIn, Blogs

Publications so far

References

Disclaimer:

Author (yogeshkulkarni@yahoo.com) gives no guarantee of the results of the program. It is just a fun script. Lot of improvements are still to be made. So, don’t depend on it at all.

yogeshhk / Sarvadnya

readme