qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.98k stars 457 forks source link

datasets.utils.info_utils.ExpectedMoreSplits: {'validation'} #286

Open SDcodehub opened 8 months ago

SDcodehub commented 8 months ago
╰─$ python llama.py /datadrive/models/Llama-2-13b-chat-hf c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors /datadrive/models/Llama-2-13b-chat-hf-gptq/llama-2-13b-4bit-gs128.safetensors

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.41it/s]
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:389: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:394: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
Downloading and preparing dataset None/en to file:///home/FRACTAL/sagar.desai/.cache/huggingface/datasets/allenai___json/en-ec45c889631c3c39/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6413.31it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1855.89it/s]
Traceback (most recent call last):
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/llama.py", line 488, in <module>
dataloader, testloader = get_loaders(args.dataset, nsamples=args.nsamples, seed=args.seed, model=args.model, seqlen=model.seqlen)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 189, in get_loaders
return get_c4(nsamples, seed, seqlen, model)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 64, in get_c4
traindata = load_dataset('allenai/c4', 'allenai--c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train', use_auth_token=False)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/load.py", line 1797, in load_dataset
builder_instance.download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 890, in download_and_prepare
self._download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 1003, in _download_and_prepare
verify_splits([self.info](http://self.info/).splits, split_dict)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 91, in verify_splits
raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}

working on A100. tried with different datasets version from 2.10. to 2.12.

getting same error

iibw commented 8 months ago

This error seems to have happened because c4 was updated with some datasets configuration options which aren't supported in older versions of datasets.

To fix, upgrade datasets with pip install -U datasets and remove , 'allenai--c4' from all four c4 load_dataset lines in GPTQ-for-LLaMa/utils/datautils.py.

Some additional info here https://huggingface.co/datasets/allenai/c4/discussions/7