wenge-research / YAYI2

YAYI 2 是中科闻歌研发的新一代开源大语言模型,采用了超过 2 万亿 Tokens 的高质量、多语言语料进行预训练。(Repo for YaYi 2 Chinese LLMs)
Apache License 2.0
3.61k stars 17 forks source link

'YayiTokenizer' object has no attribute 'sp_model' #9

Open wudi-7mi opened 3 months ago

wudi-7mi commented 3 months ago

运行样例程序时碰到:

Traceback (most recent call last):
  File "/data/model/yayi2-30b/try.py", line 2, in <module>
    tokenizer = AutoTokenizer.from_pretrained("/data/model/yayi2-30b", trust_remote_code=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user23202791/.conda/envs/fastchat/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 847, in from_pretrained
    return tokenizer_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user23202791/.conda/envs/fastchat/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/user23202791/.conda/envs/fastchat/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user23202791/.cache/huggingface/modules/transformers_modules/yayi2-30b/tokenization_yayi.py", line 74, in __init__
    super().__init__(
  File "/data/user23202791/.conda/envs/fastchat/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 367, in __init__
    self._add_tokens(
  File "/data/user23202791/.conda/envs/fastchat/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
    current_vocab = self.get_vocab().copy()
                    ^^^^^^^^^^^^^^^^
  File "/data/user23202791/.cache/huggingface/modules/transformers_modules/yayi2-30b/tokenization_yayi.py", line 111, in get_vocab
    vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
                                                             ^^^^^^^^^^^^^^^
  File "/data/user23202791/.cache/huggingface/modules/transformers_modules/yayi2-30b/tokenization_yayi.py", line 107, in vocab_size
    return self.sp_model.get_piece_size()
           ^^^^^^^^^^^^^
AttributeError: 'YayiTokenizer' object has no attribute 'sp_model'

系统版本:Ubuntu 22.04 包版本如下:

Package                           Version
--------------------------------- ------------
accelerate                        0.30.1
addict                            2.4.0
aiofiles                          23.2.1
aiohttp                           3.9.5
aiosignal                         1.3.1
aliyun-python-sdk-core            2.15.1
aliyun-python-sdk-kms             2.16.3
altair                            5.3.0
annotated-types                   0.6.0
anyio                             4.3.0
asttokens                         2.4.1
attrs                             23.2.0
certifi                           2024.2.2
cffi                              1.16.0
charset-normalizer                3.3.2
click                             8.1.7
cloudpickle                       3.0.0
cmake                             3.29.3
comm                              0.2.2
contourpy                         1.2.1
crcmod                            1.7
cryptography                      42.0.8
cycler                            0.12.1
datasets                          2.18.0
debugpy                           1.6.7
decorator                         5.1.1
dill                              0.3.8
diskcache                         5.6.3
distro                            1.9.0
dnspython                         2.6.1
einops                            0.8.0
email_validator                   2.1.1
exceptiongroup                    1.2.0
executing                         2.0.1
fastapi                           0.111.0
fastapi-cli                       0.0.3
ffmpy                             0.3.2
filelock                          3.14.0
flash-attn                        2.5.8
fonttools                         4.51.0
frozenlist                        1.4.1
fschat                            0.2.36
fsspec                            2024.2.0
gast                              0.5.4
gradio                            4.31.3
gradio_client                     0.16.3
h11                               0.14.0
httpcore                          1.0.5
httptools                         0.6.1
httpx                             0.27.0
huggingface-hub                   0.23.0
idna                              3.7
importlib_metadata                7.1.0
importlib_resources               6.4.0
interegular                       0.3.3
ipykernel                         6.29.4
ipython                           8.25.0
jedi                              0.19.1
Jinja2                            3.1.4
jmespath                          0.10.0
joblib                            1.4.2
jsonschema                        4.22.0
jsonschema-specifications         2023.12.1
jupyter_client                    8.6.2
jupyter_core                      5.7.2
kiwisolver                        1.4.5
lark                              1.1.9
llvmlite                          0.42.0
lm-format-enforcer                0.9.8
markdown-it-py                    3.0.0
markdown2                         2.4.13
MarkupSafe                        2.1.5
matplotlib                        3.9.0
matplotlib-inline                 0.1.7
mdurl                             0.1.2
modelscope                        1.15.0
mpmath                            1.3.0
msgpack                           1.0.8
multidict                         6.0.5
multiprocess                      0.70.16
nest_asyncio                      1.6.0
networkx                          3.3
nh3                               0.2.17
ninja                             1.11.1.1
numba                             0.59.1
numpy                             1.26.4
nvidia-cublas-cu12                12.1.3.1
nvidia-cuda-cupti-cu12            12.1.105
nvidia-cuda-nvrtc-cu12            12.1.105
nvidia-cuda-runtime-cu12          12.1.105
nvidia-cudnn-cu12                 8.9.2.26
nvidia-cufft-cu12                 11.0.2.54
nvidia-curand-cu12                10.3.2.106
nvidia-cusolver-cu12              11.4.5.107
nvidia-cusparse-cu12              12.1.0.106
nvidia-ml-py                      12.550.52
nvidia-nccl-cu12                  2.20.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.1.105
openai                            1.30.1
OpenCC                            1.1.6
orjson                            3.10.3
oss2                              2.18.5
outlines                          0.0.34
packaging                         24.1
pandas                            2.2.2
parso                             0.8.4
peft                              0.10.0
pexpect                           4.9.0
pickleshare                       0.7.5
pillow                            10.3.0
pip                               24.0
platformdirs                      4.2.2
plotly                            5.22.0
prometheus_client                 0.20.0
prometheus-fastapi-instrumentator 7.0.0
prompt-toolkit                    3.0.43
protobuf                          5.26.1
psutil                            5.9.8
ptyprocess                        0.7.0
pure-eval                         0.2.2
py-cpuinfo                        9.0.0
pyarrow                           16.1.0
pyarrow-hotfix                    0.6
pycparser                         2.22
pycryptodome                      3.20.0
pydantic                          2.7.1
pydantic_core                     2.18.2
pydub                             0.25.1
Pygments                          2.18.0
pyparsing                         3.1.2
python-dateutil                   2.9.0.post0
python-dotenv                     1.0.1
python-multipart                  0.0.9
pytz                              2024.1
PyYAML                            6.0.1
pyzmq                             25.1.2
ray                               2.22.0
referencing                       0.35.1
regex                             2024.5.15
requests                          2.31.0
rich                              13.7.1
rpds-py                           0.18.1
ruff                              0.4.4
safetensors                       0.4.3
scipy                             1.13.0
semantic-version                  2.10.0
sentencepiece                     0.2.0
setuptools                        69.5.1
shellingham                       1.5.4
shortuuid                         1.0.13
simplejson                        3.19.2
six                               1.16.0
sniffio                           1.3.1
sortedcontainers                  2.4.0
stack-data                        0.6.2
starlette                         0.37.2
svgwrite                          1.4.3
sympy                             1.12
tenacity                          8.3.0
tiktoken                          0.6.0
tokenizers                        0.19.1
tomli                             2.0.1
tomlkit                           0.12.0
toolz                             0.12.1
torch                             2.3.0
tornado                           6.3.3
tqdm                              4.66.4
traitlets                         5.14.3
transformers                      4.40.2
triton                            2.3.0
typer                             0.12.3
typing_extensions                 4.11.0
tzdata                            2024.1
ujson                             5.10.0
urllib3                           2.2.1
uvicorn                           0.29.0
uvloop                            0.19.0
vllm                              0.4.2
vllm_nccl_cu12                    2.18.1.0.4.0
watchfiles                        0.21.0
wavedrom                          2.0.3.post3
wcwidth                           0.2.13
websockets                        11.0.3
wheel                             0.43.0
xformers                          0.0.26.post1
xxhash                            3.4.1
yapf                              0.40.2
yarl                              1.9.4
zipp                              3.19.2

顺带一提,模型在 https://modelscope.cn/models/wenge-research/yayi2-30b 上下载时其实碰到了下面的问题:

Encountered 6 file(s) that may not have been copied correctly on Windows:
        pytorch_model-00001-of-00007.bin
        pytorch_model-00006-of-00007.bin
        pytorch_model-00004-of-00007.bin
        pytorch_model-00005-of-00007.bin
        pytorch_model-00002-of-00007.bin
        pytorch_model-00003-of-00007.bin

See: `git lfs help smudge` for more details.

不知道是否有影响

xueqizhang121 commented 3 months ago

我也遇到了相似的问题,请问您有解决方案了么

wudi-7mi commented 3 months ago

我也遇到了相似的问题,请问您有解决方案了么

没有,官方也没有提供回复

1012796366 commented 2 months ago

尝试安装一个版本稍低一点的transformers,例如 pip install transformers==4.30.0