Closed aakankshaduggal closed 1 month ago
+1 to this: I noticed this as well
It is getting better with this patch but it would be nicer if it could have some hint on possible reasons. Like, "please ensure the number of examples is enough.", "please make sure it attends the guidelines at HTTP", or something like that. You will know better.
What I know is that I just spent a day debugging this issue. I could only understand the reason after I found the issue that led to this MR, https://github.com/instructlab/sdg/issues/240
@marceloleitner Those are reasonable suggestions, although I'd ask that perhaps that be a separate issue because that's less about handling the case of a dataset being empty without crashing and more a request for better logging when something fails during the generation giving a user more indication of what potential causes of that type of failure may be.
When the qna.yaml is not appropriate or a wrong model is used, the generation fails to happen and throws an error --
instructlab.sdg.pipeline.EmptyDatasetError: Pipeline stopped: Empty dataset after running pipe
Proposed solution: