Open andhie opened 1 month ago
Hi @andhie 👋, I'm here to assist you with any issues, questions, or contributions you have. Let's squash those bugs and find solutions together!
The issue you're experiencing with the RagDatasetGenerator
in version 0.11.18, where multiple choice questions are being treated as individual questions, might be related to how the generator processes and formats multiple choice questions. Unfortunately, I couldn't find specific details on how multiple choice questions are handled internally in the RagDatasetGenerator
or any known issues related to this in the available documentation [1].
To address this, you might consider checking the configuration options and parameters that influence question generation. For instance, you can customize the text_question_template
or text_qa_template
to better handle multiple choice questions. Additionally, reviewing the logic in your implementation to ensure that the multiple choice questions are correctly formatted before being processed by the generator might help mitigate this issue [1][2].
To continue talking to Dosu, mention @dosu.
Bug Description
Using the RagDatasetGenerator to generate dataset over a document, i noticed that there a few instance where GPT-4o generate a multiple choice question, but its now treats as individual questions for each choice thus generating improper dataset.
Version
0.11.18
Steps to Reproduce
Relevant Logs/Tracbacks
No response