Closed Tonanguyxiro closed 2 days ago
Hi! The huggingface datasets have some changes in the path that breaks the code. You can try to use "nyu-mll/glue" instead of "glue" as the benchmark name. Remember also to update finetune.py#18 for the argument verification
Hi ! Finally I figure out the reason, some incorrect parameters may exist in the previous cache, so the error disapperaed after cleaning the tmp folder.
Thanks for your reply and making this code avaliable for us to learn.
We try to run the script finetune.py with command
export HF_ENDPOINT=https://hf-mirror.com
(for proxy) andpython finetune.py --task sst2 --model switch-base-8 --benchmark glue --batch_size 64
to run, I just found that the dataset cannot be loaded at the linedataset = load_dataset(args.benchmark ,args.task, cache_dir=f"{config.BASEDIR}/tmp/")
, the error is like following, I am interested how you prepare the data before run, do you pre-download the dataset first.Save model to /home/xxx/project-MoE/test-SiDA-MoE/data/sst2/switch-base-8/finetuned glue sst2 /home/xxx/project-MoE/test-SiDA-MoE/tmp/ Benchmark: glue (type: <class 'str'>) Task: sst2 (type: <class 'str'>) Cache directory: /home/xxx/project-MoE/test-SiDA-MoE/tmp/, typr: <class 'str'> Downloading and preparing dataset None/ax to /home/xxx/.cache/huggingface/datasets/parquet/ax-738ea43827ac551a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec... parquet Downloading data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1322.85it/s] Extracting data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 284.79it/s] Traceback (most recent call last): File "/home/xxx/project-MoE/test-SiDA-MoE/src/finetune.py", line 193, in
dataset = load_dataset(args.benchmark
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/load.py", line 1797, in load_dataset
builder_instance.download_and_prepare(
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/builder.py", line 890, in download_and_prepare
self._download_and_prepare(
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/builder.py", line 986, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/builder.py", line 1707, in _prepare_split
split_info = self.info.splits[split_generator.name]
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/splits.py", line 530, in getitem
instructions = make_file_instructions(
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/arrow_reader.py", line 112, in make_file_instructions
name2filenames = {
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/arrow_reader.py", line 113, in
info.name: filenames_for_dataset_split(
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/naming.py", line 74, in filenames_for_dataset_split
prefix = filename_prefix_for_split(dataset_name, split)
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/site-packages/datasets/naming.py", line 55, in filename_prefix_for_split
if os.path.basename(name) != name:
File "/home/xxx/anaconda3/envs/sida-moe/lib/python3.10/posixpath.py", line 143, in basename
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType