Closed vecorro closed 7 months ago
🤖
Hey @vecorro! Great to see you back here, diving into the depths of question generation 🤿
Based on the information you've provided and the context from the LlamaIndex repository, it seems that the RagDatasetGenerator
and the deprecated DatasetGenerator
use different methods to generate questions.
The RagDatasetGenerator
uses a language model (LLM) to generate questions based on the given documents. It does this by creating a query_engine
from the SummaryIndex
of each document, and then using this query_engine
to generate questions based on the question_gen_query
provided. This question_gen_query
is a prompt that instructs the LLM to generate questions. By default, it is set to:
"You are a Teacher/Professor. Your task is to setup {num_questions_per_chunk} questions for an upcoming quiz/examination. The questions should be diverse in nature across the document. Restrict the questions to the context information provided."
This is different from the deprecated DatasetGenerator
, which may have used a different method to generate questions. Without the code or description of the deprecated DatasetGenerator
, it's hard to provide a more specific comparison. However, the key difference is likely in the method of question generation, with the RagDatasetGenerator
using a language model and the DatasetGenerator
using a different approach.
You can find more details about this in the llama-index-core/llama_index/core/llama_dataset/generator.py file.
As for why you're not getting questions with RagDatasetGenerator
, it's hard to say without more information. It could be due to the specific documents you're using, the settings of the LLM, or other factors. If you could provide more details about the documents and the settings of the LLM, it might be easier to diagnose the issue.
What was the resolution? I can't find a reference to one @vecorro.
Bug Description
I'm trying to reproduce the code from the question generation example which even for LlamaIndex v0.10.7 documentation the example leverages
DatasetGenerator
(deprecated) instead ofRagDatasetGenerator
.The problem is that if I use
DatasetGenerator
I get a set of questions very similar to example's expected output. However, if I useRagDatasetGenerator
, I just get a bunch of text but no questions at all.Version
v0.10.7
Steps to Reproduce
Code based on
DatasetGenerator
which shows a correct/expected behavior:Output sample (353 questions):
However, if I run this the new class method
RagDatasetGenerator
, I don' get any questions:Output:
Thanks in advance for your help!
Relevant Logs/Tracbacks