Open furkansherwani opened 8 months ago
The script requires an eval dataset to be provided as an argument. However, please try to debug this as this repo is not actively maintained.
But there is no .csv file in validation folder in Datasets. It is a .json file. Please help me understand this.
@furkansherwani Just split the test dataset (or split it however you want really):
# Split the test dataset in half
train_test_split = id_tokenized_ds['test'].train_test_split(test_size=0.5)
Then rename one portion to 'validation':
id_tokenized_ds['test'] = train_test_split['train']
id_tokenized_ds['validation'] = train_test_split['test'] # Use 'test' as the validation set
id_tokenized_ds
DatasetDict({ train: Dataset({ features: ['raw_text', 'aspectTerms', 'labels', 'text', 'index_level_0', 'input_ids', 'attention_mask'], num_rows: 590 }) test: Dataset({ features: ['raw_text', 'aspectTerms', 'labels', 'text', 'input_ids', 'attention_mask'], num_rows: 127 }) validation: Dataset({ features: ['raw_text', 'aspectTerms', 'labels', 'text', 'input_ids', 'attention_mask'], num_rows: 127 }) })
This solved the error in my case.
ValueError Traceback (most recent call last) in <cell line: 4>()
2 get_ipython().system(' pip install -U accelerate')
3 get_ipython().system(' pip install -U transformers')
----> 4 model_trainer = t5_exp.train(id_tokenized_ds, **training_args)
6 frames /usr/local/lib/python3.10/dist-packages/transformers/trainer.py in get_eval_dataloader(self, eval_dataset) 886 """ 887 if eval_dataset is None and self.eval_dataset is None: --> 888 raise ValueError("Trainer: evaluation requires an eval_dataset.") 889 eval_dataset = eval_dataset if eval_dataset is not None else self.eval_dataset 890 data_collator = self.data_collator
ValueError: Trainer: evaluation requires an eval_dataset.