mymusise / ChatGLM-Tuning

基于ChatGLM-6B + LoRA的Fintune方案
MIT License
3.71k stars 443 forks source link

tokenize_dataset_rows.py运行报错 #239

Closed LuJH12 closed 1 year ago

LuJH12 commented 1 year ago

在运行的时候报错,看了下好像是没办法从huggingface里下载东西

`Downloading and preparing dataset generator/default to C:/Users/Administrator/.cache/huggingface/datasets/generator/default-3f09b66b67364cbd/0.0.0... Generating train split: 0 examples [00:00, ? examples/s]Traceback (most recent call last): File "D:\Software\anaconda3\envs\ljh\lib\site-packages\transformers\utils\hub.py", line 417, in cached_file resolved_file = hf_hub_download( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) File "D:\Software\anaconda3\envs\ljh\lib\site-packages\huggingface_hub\file_download.py", line 1291, in hf_hub_download raise LocalEntryNotFoundError( huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\Software\anaconda3\envs\ljh\lib\site-packages\datasets\builder.py", line 1608, in _prepare_split_single for key, record in generator: File "D:\Software\anaconda3\envs\ljh\lib\site-packages\datasets\packaged_modules\generator\generator.py", line 30, in _generate_examples for idx, ex in enumerate(self.config.generator(gen_kwargs)): File "D:\LJH\Fine-Tuning\tokenize_dataset_rows.py", line 25, in read_jsonl tokenizer = transformers.AutoTokenizer.from_pretrained( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 658, in from_pretrained config = AutoConfig.from_pretrained( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\transformers\models\auto\configuration_auto.py", line 944, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "D:\Software\anaconda3\envs\ljh\lib\site-packages\transformers\configuration_utils.py", line 574, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) File "D:\Software\anaconda3\envs\ljh\lib\site-packages\transformers\configuration_utils.py", line 629, in _get_config_dict resolved_config_file = cached_file( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\transformers\utils\hub.py", line 452, in cached_file raise EnvironmentError( OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like model is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\Software\anaconda3\envs\ljh\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\Software\anaconda3\envs\ljh\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "c:\Users\Administrator.vscode\extensions\ms-python.python-2023.10.1\pythonFiles\lib\python\debugpy__main__.py", line 39, in cli.main() File "c:\Users\Administrator.vscode\extensions\ms-python.python-2023.10.1\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main run() File "c:\Users\Administrator.vscode\extensions\ms-python.python-2023.10.1\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 284, in run_file runpy.run_path(target, run_name="main") File "c:\Users\Administrator.vscode\extensions\ms-python.python-2023.10.1\pythonFiles\lib\python\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "c:\Users\Administrator.vscode\extensions\ms-python.python-2023.10.1\pythonFiles\lib\python\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals, File "c:\Users\Administrator.vscode\extensions\ms-python.python-2023.10.1\pythonFiles\lib\python\debugpy_vendored\pydevd_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "D:\LJH\Fine-Tuning\tokenize_dataset_rows.py", line 54, in ead self.builder.download_and_prepare( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\datasets\builder.py", line 872, in download_and_prepare self._download_and_prepare( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\datasets\builder.py", line 1649, in _download_and_prepare super()._download_and_prepare( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\datasets\builder.py", line 967, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "D:\Software\anaconda3\envs\ljh\lib\site-packages\datasets\builder.py", line 1488, in _prepare_split for job_id, done, content in self._prepare_split_single( File "D:\Software\anaconda3\envs\ljh\lib\site-packages\datasets\builder.py", line 1644, in _prepare_split_single raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset`

LuJH12 commented 1 year ago

已解决,是tokenize_dataset_rows.py里,model_name这个参数没改,要改成对应模型的文件夹名字

wuxiaobo commented 1 year ago

You should try setting HF_DATASETS_OFFLINE before importing datasets

Or at runtime by setting datasets.config.HF_DATASETS_OFFLINE = True