Open manisnesan opened 2 weeks ago
We employ a few-shot approach to create synthetic questions, directing the model to generate five distinct and perplexing questions by utilizing the product description. The model is instructed to incorporate the exact product name from the description into each question
Select the embedding model
Initially, we will conduct experiments to determine the optimal encoder. Keeping the sentence tokenizer, LLM (GPT-3.5-turbo), and k (20) constant, we assess four different encoders:
https://www.rungalileo.io/blog/tags/rag
Example Q&A system that generate questions using chunk based approach
https://www.kaggle.com/datasets/surajjha101/bigbasket-entire-product-list-28k-datapoints