Unable to load the dataset from HuggingFace hub, throws a ValueError

I have trying to fine-tune the llama2-7B model following the instructions provided in the repository and it throws the following ValueError while loading the TOFU dataset.

Error executing job with overrides: ['split=full', 'batch_size=4', 'gradient_accumulation_steps=4', 'model_family=llama2-7b', 'lr=1e-5']
Traceback (most recent call last):
  File "/p/compressionleakage/llm_privacy/tofu/finetune.py", line 137, in <module>
    main()
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/p/compressionleakage/.conda/envs/tofu/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/p/compressionleakage/llm_privacy/tofu/finetune.py", line 62, in main
    torch_format_dataset = TextDatasetQA(cfg.data_path, tokenizer=tokenizer, model_family = cfg.model_family, max_length=max_length, split=cfg.split)
  File "/p/compressionleakage/llm_privacy/tofu/data_module.py", line 118, in __init__
    self.data = datasets.load_dataset(data_path, split)["train"]
  File "/u/deu9yh/.local/lib/python3.10/site-packages/datasets/load.py", line 1687, in load_dataset
    builder_instance.download_and_prepare(
  File "/u/deu9yh/.local/lib/python3.10/site-packages/datasets/builder.py", line 605, in download_and_prepare
    self._download_and_prepare(
  File "/u/deu9yh/.local/lib/python3.10/site-packages/datasets/builder.py", line 694, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/u/deu9yh/.local/lib/python3.10/site-packages/datasets/builder.py", line 1154, in _prepare_split
    writer.write_table(table)
  File "/u/deu9yh/.local/lib/python3.10/site-packages/datasets/arrow_writer.py", line 508, in write_table
    pa_table = table_cast(pa_table, self._schema)
  File "/u/deu9yh/.local/lib/python3.10/site-packages/datasets/table.py", line 1858, in table_cast
    return cast_table_to_schema(table, schema)
  File "/u/deu9yh/.local/lib/python3.10/site-packages/datasets/table.py", line 1840, in cast_table_to_schema
    raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
ValueError: Couldn't cast
question: string
answer: string
paraphrased_answer: string
perturbed_answer: list<item: string>
  child 0, item: string
paraphrased_question: string
to
{'question': Value(dtype='string', id=None), 'answer': Value(dtype='string', id=None)}
because column names don't match

It could be a bug in the code or a problem from HuggingFace side. Issue needs to be investigated for smooth execution of the codebase.

Thank you.

locuslab / tofu

Unable to load the dataset from HuggingFace hub, throws a ValueError #1