Open dpalmasan opened 3 weeks ago
Hi @dpalmasan! Looks like you might be inheriting from the wrong base class for your template: you should try InstructTemplate
, not InstructDataset
like so:
class TruthfulQATemplate(InstructTemplate):
template = "Instruction:\n{instruction}\n\nInput:\n{input}\n\nResponse: "
@classmethod
def format(cls, sample, column_map):
return cls.template.format(**sample)
Ohh that was the issue. I still had to make the script a package installed or add it to PYTHONPATH
to make it work. But it worked. Docs are a little bit confusing, for setting the mapping I also had to look at the code:
dataset:
_component_: torchtune.datasets.instruct_dataset
template: truthful_qa.TruthfulQATemplate
max_seq_len: 4096
source: truthfulqa/truthful_qa
split: validation
data_dir: generation
column_map:
instruction: question
input: type
output: best_answer
seed: null
shuffle: True
batch_size: 2
Maybe some examples might be good. Thanks for the quick answer.
Yep, we've definitely got to make our docs more clear - your input is much appreciated!
cc @RdoubleA
I am trying to do a finetune using a custom dataset, in particular: https://huggingface.co/datasets/truthfulqa/truthful_qa
I haven't found any clear documentation, only partial docs explaining bits https://pytorch.org/torchtune/stable/tutorials/datasets.html
I am following the instructions, and this is my custom class:
Here is my config:
However, when I try to tune a model I am getting:
What are the steps to use a custom dataset from hugging face for an instruct task?