liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验。
https://www.zhihu.com/column/c_1456193767213043713
Apache License 2.0
9.1k stars 885 forks source link

peft_prompt_tuning data preprocessing部分报错 #19

Closed XiangyueLyu closed 3 months ago

XiangyueLyu commented 3 months ago

老师您好!关于您llm-action/llm-train/peft/clm/peft_prompt_tuning_clm.ipynb中 in4第一行提到的 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) 这部分代码运行一直报错Exception: expected value at line 1 column 1,一直解决不了,请问是哪里做得不对吗?

liguodongiot commented 3 months ago

请提供下具体的信息,目前的信息不好排查

XiangyueLyu commented 3 months ago

老师您好!非常谢谢您的回复,我把报错信息附在下面,这边从bloomz下载的配置文件都是全的,但是服务器是不能连接外网的,请问是不是因为pretrain_model在网站上无法从本地载入导致的?我尝试了tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased",cache_dir='./cache3',local_files_only = True)显示无法连接到huggingface


Exception Traceback (most recent call last) Cell In[112], line 1 ----> 1 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

File ~/transformers/src/transformers/models/auto/tokenization_auto.py:680, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, *kwargs) 676 if tokenizer_class is None: 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 ) --> 680 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, **kwargs) 682 # Otherwise we have to be creative. 683 # if model is an encoder decoder, the encoder tokenizer class is used by default 684 if isinstance(config, EncoderDecoderConfig):

File ~/transformers/src/transformers/tokenization_utils_base.py:1804, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, *kwargs) 1801 else: 1802 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}") -> 1804 return cls._from_pretrained( 1805 resolved_vocab_files, 1806 pretrained_model_name_or_path, 1807 init_configuration, 1808 init_inputs, 1809 use_auth_token=use_auth_token, 1810 cache_dir=cache_dir, ... 112 elif slow_tokenizer is not None: 113 # We need to convert a slow tokenizer to build the backend 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)

Exception: expected value at line 1 column 1--------------------------------------------------------------------------- Exception Traceback (most recent call last) Cell In[112], line 1 ----> 1 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

File ~/transformers/src/transformers/models/auto/tokenization_auto.py:680, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, *kwargs) 676 if tokenizer_class is None: 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 ) --> 680 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, **kwargs) 682 # Otherwise we have to be creative. 683 # if model is an encoder decoder, the encoder tokenizer class is used by default 684 if isinstance(config, EncoderDecoderConfig):

File ~/transformers/src/transformers/tokenization_utils_base.py:1804, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, *kwargs) 1801 else: 1802 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}") -> 1804 return cls._from_pretrained( 1805 resolved_vocab_files, 1806 pretrained_model_name_or_path, 1807 init_configuration, 1808 init_inputs, 1809 use_auth_token=use_auth_token, 1810 cache_dir=cache_dir, ... 112 elif slow_tokenizer is not None: 113 # We need to convert a slow tokenizer to build the backend 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)

Exception: expected value at line 1 column 1

XiangyueLyu commented 3 months ago
截屏2024-05-31 14 16 26

老师您好!请问是因为官方文件没有预训练model吗?我的服务器不能连外网,不知道这个pretrainmodel对应哪一个model,我的邮箱是lyuxiangyue@qq.com,如果您方便的话可以把model发给我一下吗感谢老师!

XiangyueLyu commented 3 months ago

谢谢老师解决啦!官方的模型有点问题我到modelscope下了一个就跑通了感谢老师!!!!!!!!!!!!!!!!!!!!!