predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
1.86k stars 125 forks source link

Fail to run Phi-3 #485

Closed prd-tuong-nguyen closed 3 weeks ago

prd-tuong-nguyen commented 1 month ago

System Info

I meet this error when start LoraX with model microsoft/Phi-3-mini-128k-instruct

{"timestamp":"2024-05-22T07:01:39.860359Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\nSpecial tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\nYou are using a model of type phi3 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\nTraceback (most recent call last):\n\n  File \"/opt/conda/bin/lorax-server\", line 8, in <module>\n    sys.exit(app())\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py\", line 83, in serve\n    server.serve(\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/server.py\", line 309, in serve\n    asyncio.run(\n\n  File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n\n  File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 649, in run_until_complete\n    return future.result()\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/server.py\", line 243, in serve_inner\n    model = get_model(\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py\", line 251, in get_model\n    return FlashPhi3(\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_phi3.py\", line 88, in __init__\n    model = FlashPhi3ForCausalLM(config, weights)\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py\", line 482, in __init__\n    self.model = FlashPhi3Model(config, weights)\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py\", line 422, in __init__\n    [\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py\", line 423, in <listcomp>\n    FlashPhi3Layer(\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py\", line 360, in __init__\n    self.self_attn = FlashPhi3Attention(\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_phi3_modeling.py\", line 190, in __init__\n    self.rotary_emb = PositionRotaryEmbedding.static(\n\n  File \"/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py\", line 854, in static\n    scaling_factor = rope_scaling[\"factor\"]\n\nKeyError: 'factor'\n"},"target":"lorax_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}

Information

Tasks

Reproduction

Run LoraX by docker by pass base model as microsoft/Phi-3-mini-128k-instruct

Expected behavior

Server start successfully with Phi-3 model

magdyksaleh commented 1 month ago

Will attempt to repro and see what is going on

tgaddair commented 1 month ago

Looks like this could be an issue with auto-detecting RoPE scaling.

prd-tuong-nguyen commented 1 month ago

Thank guys, I hope this is fixed soon.

prd-tuong-nguyen commented 1 month ago

@tgaddair hello bro, do you have any update on this?

prd-tuong-nguyen commented 3 weeks ago

@tgaddair Hey bro, LoRaX seems able to start with microsoft/Phi-3-mini-4k-instruct but it also give this warning (I think these warning is really important)

2024-06-05T05:20:37.878887Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|endoftext|>' was expected to have ID '32000' but was given ID 'None'
2024-06-05T05:20:37.878921Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|assistant|>' was expected to have ID '32001' but was given ID 'None'
2024-06-05T05:20:37.878925Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder1|>' was expected to have ID '32002' but was given ID 'None'
2024-06-05T05:20:37.878927Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder2|>' was expected to have ID '32003' but was given ID 'None'
2024-06-05T05:20:37.878930Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder3|>' was expected to have ID '32004' but was given ID 'None'
2024-06-05T05:20:37.878933Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder4|>' was expected to have ID '32005' but was given ID 'None'
2024-06-05T05:20:37.878943Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|system|>' was expected to have ID '32006' but was given ID 'None'
2024-06-05T05:20:37.878946Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|end|>' was expected to have ID '32007' but was given ID 'None'
2024-06-05T05:20:37.878948Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder5|>' was expected to have ID '32008' but was given ID 'None'
2024-06-05T05:20:37.878951Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|placeholder6|>' was expected to have ID '32009' but was given ID 'None'
2024-06-05T05:20:37.878954Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '<|user|>' was expected to have ID '32010' but was given ID 'None'
2024-06-05T05:20:37.880592Z  WARN lorax_router: router/src/main.rs:447: `--revision` is not set
2024-06-05T05:20:37.880608Z  WARN lorax_router: router/src/main.rs:448: We strongly advise to set it to a known supported commit.
tgaddair commented 3 weeks ago

Sorry @prd-tuong-nguyen for the delay. I'll try and take a look at this today!

tgaddair commented 3 weeks ago

Hey @prd-tuong-nguyen, put together #499, which addressed the issue on my side. Should have a new main image for you to test out shortly!

prd-tuong-nguyen commented 3 weeks ago

@tgaddair cool bro, I will check the latest image

prd-tuong-nguyen commented 3 weeks ago

@tgaddair The model seems to have started successfully but I still see this warning mentioned above When run by another framework, it will show something like: Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.