yanqiangmiffy / InstructGLM

ChatGLM-6B 指令学习|指令数据|Instruct
MIT License
654 stars 51 forks source link

datasets.builder.InvalidConfigName: Bad characters from black list '<>:/\|?*' found in 'data/belle_data.json'. They could create issues when creating a directory for this config on Windows filesystem. #23

Open deepeye opened 1 year ago

deepeye commented 1 year ago
python cover_belle2jsonl.py \
    --data_path data/Belle_open_source_1M.json \
    --save_path data/belle_data.jsonl

执行以上报如下错误:


Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 87018.76it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2657/2657 [00:00<00:00, 11775.86it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:00<00:00, 36222.08it/s]
Traceback (most recent call last):
  File "/data/chat/InstructGLM/cover_belle2jsonl.py", line 42, in <module>
    main()
  File "/data/chat/InstructGLM/cover_belle2jsonl.py", line 25, in main
    dataset = load_dataset("json", "data/belle_data.json")
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/load.py", line 1759, in load_dataset
    builder_instance = load_dataset_builder(
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/load.py", line 1522, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/builder.py", line 319, in __init__
    self.config, self.config_id = self._create_builder_config(
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/builder.py", line 472, in _create_builder_config
    builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs)
  File "<string>", line 14, in __init__
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/datasets/builder.py", line 125, in __post_init__
    raise InvalidConfigName(
datasets.builder.InvalidConfigName: Bad characters from black list '<>:/\|?*' found in 'data/belle_data.json'. They could create issues when creating a directory for this config on Windows filesystem.
deepeye commented 1 year ago

已解决,修正如下: dataset = load_dataset("json", data_files=args.data_path)