In the current implementation of SDG, we have a knowledge YAML format that requires the inclusion of context chunks that connect to qna (questions and answers) in the file that SDG uses to generate data.
Currently, this requires having 3 qna samples per context chunk. No more, no less. If you provide less I believe it won't work, and if you provide more, any qna beyond the first 3 will be ignored.
This should be configurable and more robust in future releases if possible.
From @mairin
In the schema, we have
minItems=3
And all of the knowledge prompts in the simple, full, and agentic pipelines only handle
icl_query_{1,2,3}