jxnl / blog

https://jxnl.co
MIT License
97 stars 43 forks source link

draft of synthetic question generation #48

Closed ivanleomk closed 3 weeks ago

ivanleomk commented 5 months ago

WIP but added a small section on NDCG and MRR + how to calculate them and the lancedb code walkthrough on how to retrieve chunks from the vector db.

Next few steps is to write up the code and walkthrough for a script to generate ~30 questions, run retrieval with semantic search and then from there run evaluations.


Ellipsis :rocket: This PR description was created by Ellipsis for commit 7557e4ddf05baf48aa2c3144d07ca2be22d6e0a0.

Summary:

This PR adds a new blog post on evaluating the quality of Retrieval-Augmented Generation (RAG) applications, with practical code snippets and explanations on using Instructor for synthetic data generation, lancedb for chunking and embedding, and NDCG and MRR as performance metrics.

Key points:


Generated with :heart: by ellipsis.dev

cloudflare-workers-and-pages[bot] commented 5 months ago

Deploying blog with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7557e4d
Status: ✅  Deploy successful!
Preview URL: https://1f575d1a.blog-cbl.pages.dev
Branch Preview URL: https://synthethic-data.blog-cbl.pages.dev

View logs