mistralai / mistral-inference

Official inference library for Mistral models
https://mistral.ai/
Apache License 2.0
9.49k stars 835 forks source link

[BUG: config.json in mamba-codestral-7B-v0.1 is error #207

Closed Fly-Pluche closed 1 month ago

Fly-Pluche commented 1 month ago

Python -VV

Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]

Pip Freeze

accelerate==0.33.0
addict==2.4.0
annotated-types==0.7.0
apex @ file:///data2/apex
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
astunparse==1.6.3
attrs==23.1.0
backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work
beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work
bitsandbytes==0.41.2
blinker==1.7.0
boltons @ file:///croot/boltons_1677628692245/work
brotlipy==0.7.0
certifi @ file:///croot/certifi_1690232220950/work/certifi
cffi @ file:///croot/cffi_1670423208954/work
chardet @ file:///home/builder/ci_310/chardet_1640804867535/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
click==8.1.7
conda @ file:///croot/conda_1696257509808/work
conda-build @ file:///croot/conda-build_1696257509796/work
conda-content-trust @ file:///croot/conda-content-trust_1693490622020/work
conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1691418897561/work/src
conda-package-handling @ file:///croot/conda-package-handling_1690999929514/work
conda_index @ file:///croot/conda-index_1695310357675/work
conda_package_streaming @ file:///croot/conda-package-streaming_1690987966409/work
contourpy==1.2.0
coverage==7.6.0
cpm-kernels @ file:///data2/package_build/cpm_kernels
cryptography @ file:///croot/cryptography_1694444244250/work
cycler==0.12.1
Cython==3.0.6
decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work
dnspython==2.4.2
docstring_parser==0.16
einops==0.7.0
eventlet==0.36.1
exceptiongroup @ file:///croot/exceptiongroup_1668714342571/work
executing @ file:///opt/conda/conda-bld/executing_1646925071911/work
expecttest==0.1.6
filelock @ file:///croot/filelock_1672387128942/work
fire==0.6.0
flash-attn==2.3.6
Flask==3.0.0
fonttools==4.45.1
fsspec==2023.9.2
gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645455533097/work
google==3.0.0
greenlet==3.0.3
grpcio==1.59.3
grpcio-tools==1.59.3
huggingface-hub==0.24.5
hypothesis==6.87.1
idna @ file:///croot/idna_1666125576474/work
ipython @ file:///croot/ipython_1694181358621/work
itsdangerous==2.1.2
jedi @ file:///tmp/build/80754af9/jedi_1644315229345/work
Jinja2 @ file:///croot/jinja2_1666908132255/work
joblib==1.3.2
jsonpatch @ file:///tmp/build/80754af9/jsonpatch_1615747632069/work
jsonpointer==2.1
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
libarchive-c @ file:///tmp/build/80754af9/python-libarchive-c_1617780486945/work
libmambapy @ file:///croot/mamba-split_1685993156657/work/libmambapy
linecache2==1.0.0
MarkupSafe @ file:///opt/conda/conda-bld/markupsafe_1654597864307/work
matplotlib==3.8.2
matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work
mistral_common==1.3.3
mistral_inference==1.3.1
mkl-fft @ file:///croot/mkl_fft_1695058164594/work
mkl-random @ file:///croot/mkl_random_1695059800811/work
mkl-service==2.4.0
modelscope==1.16.0
more-itertools @ file:///tmp/build/80754af9/more-itertools_1637733554872/work
mpmath @ file:///croot/mpmath_1690848262763/work
networkx @ file:///croot/networkx_1690561992265/work
ninja==1.11.1.1
numpy @ file:///croot/numpy_and_numpy_base_1695830428084/work/dist/numpy-1.26.0-cp310-cp310-linux_x86_64.whl#sha256=fc2732718bc9e06a7b702492cb4f5afffe9671083930452d894377bf563464a3
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==23.2
pandas==2.2.2
parso @ file:///opt/conda/conda-bld/parso_1641458642106/work
pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work
pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work
Pillow @ file:///croot/pillow_1695134008276/work
pkginfo @ file:///croot/pkginfo_1679431160147/work
pluggy @ file:///tmp/build/80754af9/pluggy_1648024709248/work
prompt-toolkit @ file:///croot/prompt-toolkit_1672387306916/work
protobuf==4.25.1
psutil @ file:///opt/conda/conda-bld/psutil_1656431268089/work
ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work
pycosat @ file:///croot/pycosat_1666805502580/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic==2.6.1
pydantic_core==2.16.2
Pygments @ file:///croot/pygments_1684279966437/work
pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work
pyparsing==3.1.1
PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work
python-dateutil==2.8.2
python-etcd==0.4.5
pytz @ file:///croot/pytz_1695131579487/work
PyYAML @ file:///croot/pyyaml_1670514731622/work
referencing==0.35.1
regex==2023.10.3
requests @ file:///croot/requests_1690400202158/work
rpds-py==0.19.1
ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work
ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work
safetensors==0.4.3
scikit-learn==1.3.2
scipy==1.11.4
sentencepiece==0.2.0
simple_parsing==0.1.5
six @ file:///tmp/build/80754af9/six_1644875935023/work
sortedcontainers==2.4.0
soupsieve @ file:///croot/soupsieve_1696347547217/work
stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work
sympy @ file:///croot/sympy_1668202399572/work
termcolor==2.4.0
threadpoolctl==3.2.0
tiktoken==0.7.0
tokenizers==0.19.1
tomli @ file:///opt/conda/conda-bld/tomli_1657175507142/work
toolz @ file:///croot/toolz_1667464077321/work
torch==2.4.0
torchaudio==2.1.0
torchelastic==0.2.2
torchvision==0.16.0
tqdm @ file:///croot/tqdm_1679561862951/work
traceback2==1.4.0
traitlets @ file:///croot/traitlets_1671143879854/work
transformers==4.42.4
tree-sitter==0.20.4
triton==3.0.0
truststore @ file:///croot/truststore_1695244293384/work
types-dataclasses==0.6.6
typing_extensions==4.12.2
tzdata==2024.1
unittest2==1.1.0
urllib3 @ file:///croot/urllib3_1686163155763/work
wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work
Werkzeug==3.0.1
xformers==0.0.27.post2
zstandard @ file:///croot/zstandard_1677013143055/work

Reproduction Steps


from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

model_id = "/gxkj/models/mamba-codestral-7B-v0.1/tokenizer.model.v3"

# load tokenizer
mistral_tokenizer = MistralTokenizer.from_file(model_id)
# chat completion request
completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
# encode message
tokens = mistral_tokenizer.encode_chat_completion(completion_request).tokens
# load model
model = "/gxkj/models/mamba-codestral-7B-v0.1"

model = Transformer.from_folder(model)
# generate results
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=mistral_tokenizer.instruct_tokenizer.tokenizer.eos_id)
# decode generated tokens
result = mistral_tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)

image

Expected Behavior

config.json is error.

Additional Context

No response

Suggested Solutions

No response

pandora-s-git commented 1 month ago

Codestral Mamba is based on the mamba architecture, and not the transformers architecture, you will have to use mistral_inference.mamba and not mistral_inference.transformer, you can take a look at : https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/mamba.py and at the README file!

Fly-Pluche commented 1 month ago

Thanks