File "Dev/wanda/main.py", line 110, in <module>
main()
File "Dev/wanda/main.py", line 69, in main
prune_wanda(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)
File "Dev/wanda/lib/prune.py", line 132, in prune_wanda
dataloader, _ = get_loaders("c4",nsamples=args.nsamples,seed=args.seed,seqlen=model.seqlen,tokenizer=tokenizer)
File "Dev/wanda/lib/data.py", line 73, in get_loaders
return get_c4(nsamples, seed, seqlen, tokenizer)
File "Dev/wanda/lib/data.py", line 43, in get_c4
traindata = load_dataset('allenai/c4', 'allenai--c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train')
File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/load.py", line 1791, in load_dataset
builder_instance.download_and_prepare(
File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/builder.py", line 891, in download_and_prepare
self._download_and_prepare(
File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/builder.py", line 1004, in _download_and_prepare
verify_splits(self.info.splits, split_dict)
File ".conda/envs/prune_llm/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 91, in verify_splits
raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}
In case it matters, the Llama-2-7b-chat-hf folder:
Llama-2-7b-chat-hf/
total 39509433
-rw-r--r-- 1 usr grp 21 Jun 17 14:06 added_tokens.json
-rw-r--r-- 1 usr grp 583 Jun 17 14:06 config.json
-rw-r--r-- 1 usr grp 200 Jun 17 14:06 generation_config.json
-rw-r--r-- 1 usr grp 7020 Jun 17 14:06 LICENSE.txt
-rw-r--r-- 1 usr grp 9976576152 Jun 17 14:15 model-00001-of-00002.safetensors
-rw-r--r-- 1 usr grp 3500296424 Jun 17 14:09 model-00002-of-00002.safetensors
-rw-r--r-- 1 usr grp 26788 Jun 17 14:06 model.safetensors.index.json
-rw-r--r-- 1 usr grp 9877989586 Jun 17 14:13 pytorch_model-00001-of-00003.bin
-rw-r--r-- 1 usr grp 9894801014 Jun 17 14:14 pytorch_model-00002-of-00003.bin
-rw-r--r-- 1 usr grp 7180990649 Jun 17 14:12 pytorch_model-00003-of-00003.bin
-rw-r--r-- 1 usr grp 26788 Jun 17 14:06 pytorch_model.bin.index.json
-rw-r--r-- 1 usr grp 10148 Jun 17 14:06 README.md
-rw-r--r-- 1 usr grp 435 Jun 17 14:06 special_tokens_map.json
-rw-r--r-- 1 usr grp 746 Jun 17 14:06 tokenizer_config.json
-rw-r--r-- 1 usr grp 1842764 Jun 17 14:06 tokenizer.json
-rw-r--r-- 1 usr grp 499723 Jun 17 14:06 tokenizer.model
-rw-r--r-- 1 usr grp 4766 Jun 17 14:06 USE_POLICY.md
Command:
Result:
In case it matters, the Llama-2-7b-chat-hf folder: