junnoslab / DQChat-LangGraph

0 stars 0 forks source link

Dataset format refactoring #4

Closed SwiftyJunnos closed 3 weeks ago

SwiftyJunnos commented 3 weeks ago

Refactor dataset for RAFT training.

Before After
question: str
User requested question.
id: str
ID of self (QA Dataset item)
prompt: str
Prompt actually delivered to model.
dataset_id: str
ID of Dataset
doc_ids: list[str]
IDs of documents retrieved from vector store.
question: str
User requested question.
positive_doc_ids: list[str]
IDs of documents which distance is below threshold.
context: str
Retrieved answers from vector store.
negative_doc_ids: list[str]
IDs of documents which distance is above threshold.
reason: str
Chain of Thought response explaining answer from model.
reasons: str
Dumped python dict containing metadata of doc_ids.
answer: str
Answer response from model.
answer: str
Answer response from model.
output: str
Full output of reasons and answer.
linear[bot] commented 3 weeks ago
DQC-14 Dataset format refactor

Refactor dataset for RAFT training.