peft_prompt_tuning data preprocessing部分报错

XiangyueLyu commented 3 months ago

老师您好！关于您llm-action/llm-train/peft/clm/peft_prompt_tuning_clm.ipynb中 in4第一行提到的 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) 这部分代码运行一直报错Exception: expected value at line 1 column 1，一直解决不了，请问是哪里做得不对吗？

liguodongiot commented 3 months ago

请提供下具体的信息，目前的信息不好排查

XiangyueLyu commented 3 months ago

老师您好！非常谢谢您的回复，我把报错信息附在下面，这边从bloomz下载的配置文件都是全的，但是服务器是不能连接外网的，请问是不是因为pretrain_model在网站上无法从本地载入导致的？我尝试了tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased",cache_dir='./cache3'，local_files_only = True)显示无法连接到huggingface

Exception Traceback (most recent call last) Cell In[112], line 1 ----> 1 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

File ~/transformers/src/transformers/models/auto/tokenization_auto.py:680, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, *kwargs) 676 if tokenizer_class is None: 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 ) --> 680 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, **kwargs) 682 # Otherwise we have to be creative. 683 # if model is an encoder decoder, the encoder tokenizer class is used by default 684 if isinstance(config, EncoderDecoderConfig):

File ~/transformers/src/transformers/tokenization_utils_base.py:1804, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, *kwargs) 1801 else: 1802 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}") -> 1804 return cls._from_pretrained( 1805 resolved_vocab_files, 1806 pretrained_model_name_or_path, 1807 init_configuration, 1808 init_inputs, 1809 use_auth_token=use_auth_token, 1810 cache_dir=cache_dir, ... 112 elif slow_tokenizer is not None: 113 # We need to convert a slow tokenizer to build the backend 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)

Exception: expected value at line 1 column 1--------------------------------------------------------------------------- Exception Traceback (most recent call last) Cell In[112], line 1 ----> 1 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

File ~/transformers/src/transformers/models/auto/tokenization_auto.py:680, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, *kwargs) 676 if tokenizer_class is None: 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 ) --> 680 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, **kwargs) 682 # Otherwise we have to be creative. 683 # if model is an encoder decoder, the encoder tokenizer class is used by default 684 if isinstance(config, EncoderDecoderConfig):

File ~/transformers/src/transformers/tokenization_utils_base.py:1804, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, *kwargs) 1801 else: 1802 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}") -> 1804 return cls._from_pretrained( 1805 resolved_vocab_files, 1806 pretrained_model_name_or_path, 1807 init_configuration, 1808 init_inputs, 1809 use_auth_token=use_auth_token, 1810 cache_dir=cache_dir, ... 112 elif slow_tokenizer is not None: 113 # We need to convert a slow tokenizer to build the backend 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)

Exception: expected value at line 1 column 1

XiangyueLyu commented 3 months ago

老师您好！请问是因为官方文件没有预训练model吗？我的服务器不能连外网，不知道这个pretrainmodel对应哪一个model，我的邮箱是lyuxiangyue@qq.com，如果您方便的话可以把model发给我一下吗感谢老师！

XiangyueLyu commented 3 months ago

谢谢老师解决啦！官方的模型有点问题我到modelscope下了一个就跑通了感谢老师！！！！！！！！！！！！！！！！！！！！！

liguodongiot / llm-action

peft_prompt_tuning data preprocessing部分报错 #19