Open anik120 opened 2 months ago
Capture same comment here too https://github.com/instruct-lab/cli/pull/776#issuecomment-2040157061
PS: this issue is just a proposal, in the hopes that a discussion will ensue about the priority of this work. Totally reasonable to just ignore is if others don't see it as a high priority issue/change will take a lot of effort to get through before opening and team does not have cycles to implement the change 😀
This issue should probably be in the https://github.com/instruct-lab/schema/ repo.
@anik120 I think it is beyond when this change could be made. Perhaps we can close this issue?
Capturing a discussion with @shivchander:
I was writing up a test case for the lmdk cli to test knowledge workflow, but the way that I laid out my qna.yaml is as follows:
Essentially, the
seed_example
question/answers I have there are from the overarching project websites https://operatorframework.io/, https://olm.operatorframework.io/ and https://sdk.operatorframework.io/, and the documents I have in https://github.com/anik120/knowledge-doc-test are README.mds from the components' GitHub repositories. In other words, theseed_example
question/answers do not actually come from the documents hosted indocument.repo
.The way I laid things out, the seed_examples are "product pitch/summary description" and
document.repo
contains all the docs I want the model to learn about.Shiv tells me that that's the wrong way of thinking about it, and the verb
document
should besource
in reality, andseed_examples
are examples of questions/answers that can be answered by the model once it's been trained on the docs hosted indocs.repo
.Eureka moment: Even after learning how the taxonomy interacts with the model, I was thinking about the structure of my
qna.yaml
, the wrong way. It's likely that other users will also confuse the taxonomy/model interactions and lay out theqna.yaml
files the wrong way, leading to PR submissions that'll likely not improve model quality. only a little while ago, ie fresh info being processed by brain stillProposed fix: Change document to source
cc: @xukai92 @abhi1092 @aldopareja