Open TobeTek opened 2 months ago
Hi @Zulkazeem I'm currently working on this. Would you like collaborate with me ? I have couple variations that I created but it still needs improvement. I would love your help on this
@Tacoman99 always good to assign tasks to yourself if you are working on one
Anyone else working on this task?
Potential Data Format that gives courses to take and reasoning. Will be replace with LPI Dataset
input: I want to learn about llm and how to finetune them. I\'m intermediate and i want to build a rag pipeline output: Given your intermediate level and interest in building a RAG pipeline using LLMs, I recommend the following courses and modules:
Course/Module | Source | Level | Duration (Estimate) | Keywords | Reason |
---|---|---|---|---|---|
Introduction to Large Language Models (LLMs) | Various online courses (e.g., Coursera, edX) | Beginner/Intermediate | Varies (4-8 weeks) | LLMs, transformers, attention mechanisms, language modeling, tokenization | Provides foundational knowledge of LLMs, their architecture, and capabilities. Essential before tackling fine-tuning or RAG. |
Natural Language Processing (NLP) Fundamentals | Various online courses (e.g., Stanford NLP) | Intermediate | Varies (6-10 weeks) | NLP, text preprocessing, word embeddings, sentiment analysis, named entity recognition | Necessary for understanding how LLMs process and understand text. Many RAG techniques rely on NLP for data preprocessing and query understanding. |
Fine-tuning LLMs | Hugging Face Course, Papers with Code | Intermediate/Advanced | Varies (2-4 weeks) | Fine-tuning, transfer learning, hyperparameter tuning, model evaluation | Teaches you how to adapt pre-trained LLMs to specific tasks, crucial for building a high-performing RAG system. Learn techniques like prompt engineering. |
Retrieval Augmented Generation (RAG) Techniques | Research Papers, Blogs, Tutorials | Advanced | Varies (Ongoing Study) | RAG, vector databases, embedding generation, knowledge retrieval, question answering | Focuses on the architecture and implementation of RAG pipelines. You\'ll learn to select and integrate components like vector databases (e.g., Pinecone, Weaviate) and retrieval methods. |
Python for Data Science (if needed) | DataCamp, Codecademy, Fast.ai | Intermediate | Varies (2-4 weeks) | Python, pandas, numpy, scikit-learn | Reinforce your Python skills for data manipulation and model building within your RAG pipeline. |
Vector Databases (if needed) | Pinecone, Weaviate documentation | Intermediate | Varies (1-2 weeks) | Vector databases, similarity search, indexing, scalability | Understanding vector databases is crucial for efficient knowledge retrieval in a RAG pipeline. Learn about different databases and their strengths. |
File containing 68 Sample Conversational Q-A dataset using Vertex AI Gemini pro 1.5 flash
Next Steps:
Great work @Tacoman99 ! Can you make a Pull Request to the repo with QA_LPI.json file?
Hell @TobeTek, Can i work on this?