kevinscaria / InstructABSA

Instructional learning for Aspect Based Sentiment Analysis [NAACL-2024]
https://aclanthology.org/2024.naacl-short.63/
MIT License
147 stars 24 forks source link

I am getting this error: ValueError: Trainer: evaluation requires an eval_dataset. #22

Open furkansherwani opened 8 months ago

furkansherwani commented 8 months ago

ValueError Traceback (most recent call last) in <cell line: 4>() 2 get_ipython().system(' pip install -U accelerate') 3 get_ipython().system(' pip install -U transformers') ----> 4 model_trainer = t5_exp.train(id_tokenized_ds, **training_args)

6 frames /usr/local/lib/python3.10/dist-packages/transformers/trainer.py in get_eval_dataloader(self, eval_dataset) 886 """ 887 if eval_dataset is None and self.eval_dataset is None: --> 888 raise ValueError("Trainer: evaluation requires an eval_dataset.") 889 eval_dataset = eval_dataset if eval_dataset is not None else self.eval_dataset 890 data_collator = self.data_collator

ValueError: Trainer: evaluation requires an eval_dataset.

kevinscaria commented 8 months ago

The script requires an eval dataset to be provided as an argument. However, please try to debug this as this repo is not actively maintained.

furkansherwani commented 8 months ago

But there is no .csv file in validation folder in Datasets. It is a .json file. Please help me understand this.

cyborgrob commented 7 months ago

@furkansherwani Just split the test dataset (or split it however you want really):

# Split the test dataset in half train_test_split = id_tokenized_ds['test'].train_test_split(test_size=0.5)

Then rename one portion to 'validation':

id_tokenized_ds['test'] = train_test_split['train'] id_tokenized_ds['validation'] = train_test_split['test'] # Use 'test' as the validation set id_tokenized_ds

DatasetDict({ train: Dataset({ features: ['raw_text', 'aspectTerms', 'labels', 'text', 'index_level_0', 'input_ids', 'attention_mask'], num_rows: 590 }) test: Dataset({ features: ['raw_text', 'aspectTerms', 'labels', 'text', 'input_ids', 'attention_mask'], num_rows: 127 }) validation: Dataset({ features: ['raw_text', 'aspectTerms', 'labels', 'text', 'input_ids', 'attention_mask'], num_rows: 127 }) })

This solved the error in my case.